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MATRIX ATTACHMENT REGIONS 

Field of the Invention 

The present invention relates to matrix attachment regions isolated 
from a higher plant, and to methods for isolating matrix attachment sequences. 

5 Background of the Invention 

The proteinaceous nuclear 'matrix 1 or 'scaffold' in the cell nucleus plays 
a role in determining chromatin structure. Electron micrographs show that nuclear DNA 
is attached to this scaffold at intervals to produce a series of loops (Zlatanova and Van 
Holde, CellSci. 103:889 (1992)). Matrix Attachment Regions (MARs; also referred 
10 to a scaffold attachment regions or SARs) are genomic DNA sequences which bind 
specifically to components of the nuclear matrix. See Boulikas, J. Cell. Biochem. 52:14 
(1993). These sequences are thought to define independent chromatin domains through 
their attachment to the nuclear matrix. Both transcription and replication are thought to 
occur at the nuclear matrix. 



15 Transformation of a cell using a transgene flanked by one or more 

MARs has been shown to increase expression of the transgene product, compared to 
transformation using a construct lacking MARs. See Allen et al., Plant Cell 8:899 
(1996); Bonifer et al., EMBOJ. 9:2843 (1990); McKnight et al., Proc. Natl. Acad. Sci. 
USA 89:6943 (1992); Phi-Van et al, Mol. Cell Biol 10:2303 (1990)). Flanking a GUS 

20 reporter gene with yeast MARs has been reported to result in higher and less variable 
transgene expression in plant cells. Allen et al. Plant Cell 5:603 (1993). 
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Summary of the Invention 

In view of the foregoing, a first aspect of the present invention is an 
isolated DNA molecule having a nucleotide sequence selected from the group consisting 
of SEQ ID NO: 1, 2, 4-11 and 13, sequences that hybridize to this isolated DNA under 
5 stringent conditions. 

A further aspect of the present invention is a DNA construct 
comprising a transcription initiation region, a structural gene operatively associated with 
the transcription initiation region, and at least one matrix attachment region of the 
present invention positioned either 5' to the transcription initiation region or 3' to the 
10 structural gene. 

A further aspect of the present invention is a vector comprising a DNA 
construct as described above, including plasmids, viruses and plant transformation. 

A further aspect of the present invention is a host cell containing a 
DNA construct as described above, including plant and animal host cells. 
15 A further aspect of the present invention is a method of identifying 

matrix attachment regions in a DNA molecule of known nucleotide sequence, by 
identifying a sequence section of at least twenty contiguous nucleotides that is at least 
90% A or T nucleotides. The method may further comprise preparing a MAR molecule 
of at least about 300 nucleotides, comprising the identified MAR motif. 

20 Brief Description of the Drawings 

Figure 1 provides maps depicting generalized plasmids from the 
cloning of random matrix associated DNA into pBluescript II SK+. 1 A is DNA isolated 
after digestion with Rsal and ligated into the EcoRV.site (clones 1-4, 6-8, 2-23 and 26- 
28). IB is DNA isolated after digestion with TaqI and ligated into the Clal site (clones 

25 1 1, 15, 34 and 35). 1C is DNA isolated after digestion with EcoRI and ligated into the 
EcoRI site (clones 109, 113, 115 and 116). ID is DNA isolated after digestion with 
Hindm and ligated into the Hindffl site (clones 201-203, 205, 206, 209, 21 1 and 216- 
220). IE is DNA isolated after digestion with Dnasel and ligated into the Hindi site 
(clones 302, 203, 205, 311 and 319). IF is a map of ToRB7-6 which serves as a 

30 positive control in the exogenous binding assay. 
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Figure 2 provides plasmid maps of specific clones or subclones chosen 
for sequencing. Inserts are indicated by shaded boxes labeled either 'MAR' for binding 
clones or 'insert* for non-binding clones. 2 A is plasmid pSl containing MAR SEQ ID 
NO: 1 ; 2B is plasmid pS4 containing MAR SEQ ID NO:2; 2C is plasmid pS8 containing 
5 non-binding SEQ ID NO:3; 2D is plasmid pSl 15 containing MAR SEQ ID NO:4; 2E is 
MAR plasmid pSl 16; 2F is plasmid pSl 16-L1 containing MAR SEQ ID NO:5, which 
is a smaller core binding fragment of clone 116; 2G is MAR plasmid pS202; 2H is 
plasmid pS202-l containing MAR SEQ ID NO:6,which is one of two binding fragments 
of clone 202; 21 is plasmid pS202-2 containing MAR SEQ ID NO:7, which is the 

10 second of two binding fragments of clone 202; 2J is plasmid pS203 containing a non- 
binding insert; 2K is MAR plasmid pS205; 2L is plasmid pS205-2 containing MAR 
SEQ ID NO:8, which is a core binding sequence from clone 205; 2M is MAR plasmid 
pS206; 2N is plasmid pS206-l containing MAR SEQ ID NO:9, which is a core binding 
sequence from clone 206; 20 is MAR plasmid pS211; 2P is plasmid pS211-l 

1 5 containing MAR SEQ ID NO: 1 0, which is a core binding sequence from clone 2 1 1 ; 2Q 
is MAR plasmid pS217; 2R is plasmid pS217-l containing MAR SEQ ID NO: 11, 
which is a core binding sequence from clone 217; 2S is plasmid pS218 containing non- 
binding insert SEQ ID NO: 12; 2T is MAR plasmid pS220; 2U is plasmid pS220-l 
containing MAR SEQ ID NO: 13, which is a core binding sequence from clone 220; 2V 

20 is plasmid pRB7-6, containing the MAR ToRB7-6 fragment (SEQ ID NO:20) used as a 
positive control); 2W is plasmid pGCA887 (containing an insert from yeast ARS1 
cloned into the vector pBCKS+ (Stratogene)), which serves as a standard for weak 
binding to the nuclear matrix (SEQ ID NO:21). 

Figure 3 provides the sequences of the MAR clones and subclones of the 

25 present invention, and the control sequences of the known TobRB7 MAR (SEQ ID 
NO:20) and yeast ARS1 MAR (SEQ ID NO:21). 

Figure 4 provides graphic representations depicting the locations of different 
MAR DNA motifs within the sequenced clones or subclones. Binding strengths are 
indicated on a scale of 0-100. The A box (AATAAAYAAA) (SEQ ID NO:14) is 

30 represented by "A", with 8/10 matches required for motif identification. The T box 
(TTWTWTTWTT) (SEQ ID NO: 15) is represented by "T", with 9/10 matches required 
for motif identification. The ARS consensus sequence (WTTTATRTTTW) (SEQ ID 
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NO: 16) is represented by 'R', with 10/1 1 matches required for motif identification. The 
topoisomerase II consensus sequence (GTNWAYATTNATNNR) (SEQ ID NO: 17) is 
represented by '0\ with 13/15 matches required for motif identification. If motifs 
overlapped, only one is shown. Filled boxes indicate stretches of 20 base pairs 
5 consisting of > 90% AT DNA. Base unwinding regions are represented by 'LP 
(AATATATTT; SEQ ID N0:22; Bode et al., Science 255:195 (1992)). 

Figure 5 graphs the numbers of blocks of 20 or more nucleotides that 
consist of 90% or greater A or T nucleotides found in the sequenced clones (SEQ ID 
NOS: 1, 2, 4-1 1 and 13) versus binding strength (indicated as between 0-100). The two 
10 well-characterized MARs (TobRB7 and ARS1), as well as two non-binding clones 
(SEQ ID NOs:3 and 12) were included in the analysis. 

Figure 6 graphs the %AT found in the sequenced clones (SEQ ID 
NOS: 1, 2, 4-1 1 and 13) versus binding strength (indicated as between 0-100). The two 
well-characterized MARs (TobRB7 and ARS1), as well as two non-binding clones 
1 5 (SEQ ID NOs:3 and 12) were included in the analysis 

Figure 7 graphs the number of T boxes found in the sequenced clones 
(SEQ ID NOS: 1, 2, 4-1 1 and 13) versus binding strength (indicated as between 0-100). 
The two well-characterized MARs (TobRB7 and ARS1), as well as two non-binding 
clones (SEQ ID NOs:3 and 12) were included in the analysis. 
20 Figure 8 graphs the number of A boxes found in the sequenced clones 

(SEQ ID NOS: 1, 2, 4-1 1 and 13) versus binding strength (indicated as between 0-100). 
The two well-characterized MARs (TobRB7 and ARS1), as well as two non-binding 
clones (SEQ ID NOs:3 and 12) were included in the analysis. 

Figure 9 graphs the number of base unwinding regions (BUR; SEQ 
25 ID NO:22) found in the sequence clones (SEQ ID NOS: 1, 2, 4-11 and 13) versus 
binding strength (indicated as between 0-100). The two well-characterized MARs 
(TobRB7 and ARS1), as well as two non-binding clones (SEQ ID NOs:3 and 12) were 
included in the analysis. 

Figure 10 graphs the length of the sequenced clones (SEQ ID NOS: 1, 
30 2, 4-11 and 13) versus binding strength (indicated as between 0-100). The two well- 
characterized MARs (TobRB7 and ARS1), as well as two non-binding clones (SEQ ID 
■ NOs:3 and 12) were included in the analysis. 

4 
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Figure 11 graphs the number of ARS motifs found in the sequenced 
clones (SEQ ID NOS: 1, 2, 4-1 1 and 13) versus binding strength (indicated as between 
0-100). The two well-characterized MARs (TobRB7 and ARS1), as well as two non- 
binding clones (SEQ ID NOs:3 and 12) were included in the analysis. 
5 Figure 12 graphs the number of Topoisomerase motifs found in the 

sequenced clones (SEQ ID NOS: 1, 2, 4-1 1 and 13) versus binding strength (indicated 
as between 0-100). The two well-characterized MARs (TobRB7 and ARS1), as well as 
two non-binding clones (SEQ ID NOs:3 and 12) were included in the analysis. 



10 

Detailed Description of the Invention 

Matrix attachment regions (MARs) are structural components of 
chromatin that form topologically constrained loops of DNA through their interaction 
with the proteinaceous nuclear matrix. MARs have been found to co-localize with a 

15 variety of functional elements within the nucleus including transcriptional domain 
boundaries (Jarman and Higgs, EMBO 1 7:3337 (1988); Phi Van and Stratling, EMBO 
J. 7:655 (1988); Levy-Wilson and Fortier, 1 Biol Chem. 264:21 196 (1990)), promoters, 
enhancers (Gasser and Laemmli, Cell 46:521 (1986); Cockerill and Garrard, Cell 44:273 
(1986); van der Geest, Plant 1 6:413 (1994)), introns (Kas and Chasin, J. Mol Biol 

20 194:677 (1987); Forrester et al., Science 265:1221 (1994)) and putative origins of 
replication (Brylawski et al., Cancer Res. 53:3865 (1993)), suggesting that MARs may 
play functional roles in addition to their purely structural role within the nucleus. It 
appears that not all MARs are involved in the same processes, and that categories or 
groups of these elements with distinct features and functions exist. 

25 The characteristics of MARs that dictate their binding to the nuclear 

matrix are not known. Presently, the only definition of a MAR is operational, based on 
the ability to bind to the nuclear matrix. It is known that MARs are AT rich, but not all 
AT-rich DNA will bind to the nuclear matrix; MARs have also been reported to contain 
a number of short sequence motifs, but the necessity of these motifs has not been 

30 established. Motifs reported to occur in MARs include A boxes, T boxes, the ARS 
consensus and the consensus sequence for Drosophila topoisomerase. In addition, 
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several secondary structure motifs have been reported to be associated with MARs 
including base pair unwinding regions, bent DNA and single stranded regions. 

To date most matrix attachment regions have been identified through 
their association with a well-characterized gene. This type of sampling creates a bias 

5 that could hinder efforts in defining MAR sequences. It would be useful to be able to 
identify MARs by sequence alone. The present inventors obtained a group of DNA 
fragments that were MARs by operational definition, by purifying DNA associated with 
tobacco NT-1 nuclear matrices prepared using several different nucleases. These 
sequences were cloned and tested for their ability to rebind to the nuclear matrix, in 

10 order to identify MARs. Once MARs were identified, they were sequenced and 
analyzed for AT content and the presence of common motifs. The significance of each 
identified motif was assessed through correlation with the binding strength of MARs to 
the nuclear matrix. 

The present inventors identified a number of novel MAR sequences, and 

15 identified a new MAR motif whose frequency significantly correlates with the binding 
strength of a MAR. The present inventors found no significant correlation between 
binding strength and the length of the MAR fragment. However, a significant 
relationship between binding strength and overall AT content was identified. This is the 
first report of a correlation between the* abundance of certain MAR related motifs and 

20 MAR binding strength. In addition, the newly identified MAR related motif of local AT 
rich regions (sections of 20 contiguous nucleotides that are >90% A and/or T), has a 
higher correlation to MAR binding strength than any of the previously identified motifs. 
These findings provide a method for the identification of MAR regions in DNA 
molecules of known nucleotide sequence. The method comprises identifying, in the 

25 known DNA sequence, regions or areas of the sequence which are at least 20 contiguous 
nucleotides in length and which consist of at least 90% A and/or T nucleotides. The 
presence of a 20-bp region of > 90% AT indicates a MAR; a MAR may contain 
multiple regions of > 90% AT. The identification of such regions may be carried out by 
techniques that are well-known in the art, including sequencing the DNA to be screened 

30 and reviewing a printed DNA sequence for such regions. Contiguous fragments of the 
original DNA sequence that are from one to several kilobases (from about 3,000 



6 



WO 99/07866 



PCTAJS98/163.44 



nucleotides, 2,000 nucleotides, or about 1,000 nucleotides) in length to about 500, 400, 
or 300 bases in length, and which encompass the 20-bp regions of > 90% AT can then 
be isolated (or created de novo by known synthesis techniques) and utilized as MARs. 
Optionally, the isolated fragments can first be tested for MAR binding strength, for 
5 example using an exogenous nuclear matrix binding assay as described herein. 

The identification of such regions may be carried out be techniques that 
are well-known in the art, including sequencing the DNA to be screened and reviewing 
the printed DNA sequence for such regions. Fragments of the original DNA sequence 
that are from several kilobases in length to about 500, 400, or 300 bases in length, and 

10 which encompass the 20-bp regions of > 90% AT can then be isolated (or created de 
novo by known synthesis techniques) and utilized as MARs. Optionally, the isolated 
fragments can first be tested for MAR binding strength, for example using an exogenous 
nuclear matrix binding assay as described herein. 

MARs in nature are double-stranded genomic DNA molecules. The 

1 5 MARs of the present invention include those of SEQ ID NO: 1 , SEQ ID NO:2, SEQ ID 
NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ 
ID NO: 10, SEQ ID NO: 11 and SEQ ID NO: 13. The sequences provided represent one 
strand of the double-stranded MAR DNA; the sequence of the complementary strand is 
readily apparent to those of ordinary skill in the art. 

20 It will be apparent to those of skill in the art that minor sequence 

variations from the sequences provided above will not affect the function of the MARs of 
the present invention. MAR DNA sequences of the present invention include sequences 
that are functional MARs which hybridize to DNA sequences of SEQ ID NO:l, SEQ ID 
NO:2, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ 

25 ID NO:9, SEQ ID NO:10, SEQ ID NO:ll or SEQ ID NO:13 (or the complementary 
sequences thereto) under stringent conditions. For example, hybridization of such 
sequences may be carried out under conditions represented by a wash stringency of 0.3M 
NaCl, 0.03M sodium citrate, and 0.1% SDS at 60°C, or even 70°C, in a standard in situ 
hybridization assay. (See J. Sambrook et al., Molecular Cloning, A Laboratory Manual 

30 (2d ed. 1989)(Cold Spring Harbor Laboratory)). In general, DNA sequences that act as 
MARs and hybridize to the DNA sequences give above will have at least 70%, 75%, 
80%, 85%, 90%o, 95% or even 97% or greater sequence similarity to the MAR sequences 
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provided herein. (Determinations of sequence similarity are made with the two sequences 
aligned for maximum matching; gaps in either of the two sequences being matched are 
allowed in maximizing matching.) 

MARs of the present invention may consist of or comprise the specific 

5 sequences provided herein, or nucleotide sequences having substantial sequence similarity 
to the sequences provided herein that retain MAR functions. As used herein, 'substantial 
sequence similarity* means that DNA which have slight and non-consequential sequence 
variations from the specific sequences disclosed herein are considered to be equivalent to 
the disclosed sequences. In this regard, 'slight and non-consequential' sequence 

10 variations mean that sequences with substantial sequence similarity will be functionally 
equivalent to the sequences disclosed and claimed herein. Functionally equivalent 
sequences will function in substantially the same manner as the sequences disclosed and 
claimed herein. 

DNA constructs of the present invention may be used to transform cells 
1 5 from a variety of organisms, including animal and plants (*.<?., vascular plants). As used 
herein, plants includes both gymnospenhs and angiosperms (i.e., monocots and dicots). 
As used herein, animals includes mammals, both primate and non-primate. 
Transformation according to the present invention may be used to increase expression 
levels of transgenes in stably transformed cells. Cells may be transformed while in cell 
20 culture; while in vivo or in situ in a tissue, organ, or intact organism. 

The term "operatively associated," as used herein, refers to DNA 
sequences on a single DNA molecule which are associated so that the function of one is 
affected by the other. Thus, a transcription initiation region is operatively associated 
with a structural gene when it is capable of affecting the expression of that structural 
25 gene (i.e., the structural gene is under the transcriptional control of the transcription 
initiation region). The transcription initiation region is said to be "upstream" from the 
structural gene, which is in turn said to be "downstream" from the transcription initiation 
region. 

DNA constructs, or "expression cassettes," of the present invention 
30 preferably include, 5' to 3* in the direction of transcription, a first matrix attachment 
region, a transcription initiation region, a structural gene operatively associated with the 
transcription initiation region, a termination sequence including a stop signal for RNA 
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polymerase and a polyadenylation signal for polyadenylation (e.g., the nos terminator), 
and a second matrix attachment region. All of these regions should be capable of 
operating in the cells to be transformed. The termination region may be derived from 
the same gene as the transcription initiation or promoter region, or may be derived from 
5 a different gene. 

The matrix attachment regions (or "MARs") of the present invention 
have a nucleotide sequence selected from the group consisting of SEQ ID NOS: 1,2 4- 
1 1 and 13 provided herein. These MARs may be isolated from natural sources or may 
be chemically synthesized. 

10 MARs are known to act in an orientation-independent manner. Poljak 
et al., Nucleic Acids Res. 22:4386 (1994). Genetic constructs of the present invention 
may contain MARs oriented in either direction (5 '-3' or 3'-5'), as direct repeats in a 
single orientation (-*-»), direct repeats in the opposite orientation («- «-), or either of 
two possible indirect repeats (< >) or (-> «-). The genetic constructs of the present 

1 5 invention may contain a single MAR as disclosed herein, multiple MARs of the present 
invention, or MARs of the present invention in conjunction with other MARs. A DNA 
construct of the present invention may comprise a first MAR of the present invention 5* 
to the transcription initiation region and a second MAR of a different sequence situated 
3 r to the structural gene, or vice versa. 

20 The transcription initiation region, which preferably includes the RNA 

polymerase binding site (promoter), may be native to the host organism to be 
transformed or may be derived from an alternative source, where the region is functional 
in the host. Other sources include the Agrobacterium T-DNA genes, such as the 
transcriptional initiation regions for the biosynthesis of nopaline, octapine, mannopine, 

25 or other opine transcriptional initiation regions, transcriptional initiation regions from 
plants, transcriptional initiation regions from viruses (including host specific viruses), or 
partially or wholly synthetic transcription initiation regions. Transcriptional initiation 
and termination regions are well known. See, e.g., dGreve, J. Mol. Appl. Genet 1, 499- 
511 (1983); Salomon et al, EMBOJ. 3, 141-146 (1984); Garfinkel et al., Cell 27, 143- 

30 153 (1983); and Barker et al., Plant Mol. Biol 2, 235-350 (1983). 

The transcriptional initiation regions may, in addition to the RNA 
polymerase binding site, include regions which regulate transcription, where the 

9 
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regulation involves, for example, chemical or physical repression or induction (e.g., 
regulation based on metabolites or light) or regulation based on cell differentiation (such 
as associated with leaves, roots, seed, or the like in plants). Thus, the transcriptional 
initiation region, or the regulatory portion of such region, is obtained from an 
5 appropriate gene which is so regulated. For example, the 1,5-ribulose biphosphate 
carboxylase gene is light-induced and may be used for transcriptional initiation. Other 
genes are known which are induced by stress, temperature, wounding, pathogen effects, 
etc. 

Structural genes are those portions of genes which comprise a DNA 

10 segment coding for a protein, polypeptide, or portion thereof, possibly including a 
ribosome binding site and/or a translational start codon, but lacking a transcription 
initiation region. The term can also refer to introduced copies of a structural gene where 
that gene is also naturally found within the cell being transformed. The structural gene 
may encode a protein not normally found in the cell in which the gene is introduced or 

15 in combination with the transcription initiation region to which it is operationally 
associated, in which case it is termed a heterologous structural gene. Genes which may 
be operationally, associated with a transcription initiation region of the present invention 
for expression in a plant species may be derived from a chromosomal gene, cDNA, a 
synthetic gene, or combinations thereof. Any structural gene may be employed. Where 

20 plant cells are transformed, the structural gene may encode an enzyme to introduce a 
desired trait, such as glyphosphate resistance; a protein such as a Bacillus thuringiensis 
protein (or fragment thereof) to impart insect resistance; or a plant virus protein or 
fragment thereof to impart virus resistance. 

The expression cassette may be provided in a DNA construct which 

25 also has at least one replication system. For convenience, it is common to have a 
replication system functional in Escherichia coli, such as ColEl, pSClOl, pACYC184, 
or the like; In this manner, at each stage after each manipulation, the resulting construct 
may be cloned, sequenced, and the correctness of the manipulation determined. In 
addition, or in place of the E. coli replication system, a broad host range replication 

30 system may be employed, such as the replication systems of the P-l incompatibility 
plasmids, e.g., pRK290. In addition to the replication system, there will frequently be at 
least one marker present, which may be useful in one or more hosts, or different markers 

10 
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for individual hosts. That is, one marker may be employed for selection in a prokaryotic 
host, while another marker may be employed for selection in a eukaryotic host, 
particularly a plant host. The markers may be protection against a biocide, such as 
antibiotics, toxins, heavy metals, or the like; provide complementation, for example by 
5 imparting prototrophy to an auxotrophic host; or provide a visible phenotype through 
the production of a novel compound. Exemplary genes which may be employed include 
neomycin phosphotransferase (NPTII), hygromycin phosphotransferase (HPT), 
chloramphenicol acetyltransferase (CAT), nitrilase, and the gentamicin resistance gene. 
For plant host selection, non-limiting examples of suitable markers are P-glucuronidase, 

10 providing indigo production, luciferase, providing visible light production, NPTII, 
providing kanamycin resistance or G418 resistance, HPT, providing hygromycin 
resistance, and the mutated aroA gene, providing glyphosate resistance. 

The various fragments comprising the various constructs, expression 
cassettes, markers, and the like may be introduced consecutively by restriction enzyme 

1 5 cleavage of an appropriate replication system, and insertion of the particular construct or 
fragment into the available site. After ligation and cloning the DNA construct may be 
isolated for further manipulation. All of these techniques are amply exemplified in the 
literature and find particular exemplification in Sambrook et al., Molecular Cloning: A 
Laboratory Manual, (2d Ed. 1989)(Cold Spring Harbor Laboratory, Cold Spring 

20 Harbor, NY). 

Vectors which may be used to transform plant tissue with DNA 
constructs of the present invention include vectors used for Agrobacterium-mediated 
transformation and ballistic vectors, as well as vectors suitable for direct DNA-mediated 
transformation. 

25 Microparticles carrying a DNA construct of the present invention, 

which microparticles are suitable for the ballistic transformation of a cell, are also useful 
for transforming cells according to the present invention. The microparticle is propelled 
into a cell to produce a transformed cell. Where the transformed cell is a plant cell, a 
plant may be regenerated from the transformed cell according to techniques known in 

30 the art. Any suitable ballistic cell transformation methodology and apparatus can be 
used in practicing the present invention. Exemplary apparatus and procedures are 
disclosed in Stomp et al., U.S. Patent No. 5,122,466; and Sanford and Wolf, U.S. Patent 
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No. 4,945,050 (the disclosures of all U.S. Patent references cited herein are incorporated 
herein by reference in their entirety). When using ballistic transformation procedures, 
the expression cassette may be incorporated into a plasmid capable of replicating in the 
cell to be transformed. Examples of microparticles suitable for use in such systems 
5 include 1 to 5 Jim gold spheres. The DNA construct may be deposited on the 
microparticle by any suitable technique, such as by precipitation. 

Plant species may be transformed with the DNA construct of the 
present invention by the DNA-mediated transformation of plant cell protoplasts and 
subsequent regeneration of the plant from the transformed protoplasts in accordance 

1 0 with procedures well known in the art. 

Any plant tissue capable of subsequent clonal propagation, whether by 
organogenesis or embryogenesis, may be transformed with a vector of the present 
invention. The term "organogenesis," as used herein, means a process by which shoots 
and roots are developed sequentially from meristematic centers; the term 

15 "embryogenesis," as used herein, means a process by which shoots and roots develop 
together in a concerted fashion (not sequentially), whether from somatic cells or 
gametes. The particular tissue chosen will vary depending on the clonal propagation 
systems available for, and best suited to, the particular species being transformed. 
Exemplary tissue targets include leaf disks, pollen, embryos, cotyledons, hypocotyls, 

20 megagametophytes, callus tissue, existing meristematic tissue (e.g., apical meristems, 
axillary buds, and root meristems), and induced meristem tissue (e.g., cotyledon 
meristem and hypocotyl meristem). 

Plants of the present invention may take a variety of forms. The plants 
may be chimeras of transformed cells and non-transformed cells; the plants may be 

25 clonal transformants (e.g., all cells transformed to contain the expression cassette); the 
plants may comprise grafts of transformed and untransformed tissues (e.g., a 
transformed root stock grafted to an untransformed scion in citrus species). The 
transformed plants may be propagated by a variety of means, such as by clonal 
propagation or classical breeding techniques. A dominant selectable marker (such as 

30 npt II) can be associated with the expression cassette to assist in breeding. 

Plants which may be employed in practicing the present invention 
include (but are not limited to) tobacco (Nicotiana tabacum), potato (Solanum 

12 
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tuberosum), soybean (glycine max), peanuts (Arachis hypogaea), cotton (Gossypium 
hirsutum), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Cofea 
spp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), 
cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado 
5 (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera 
indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium 
occidentale), macadarnia (Macadamia integrifolia), almond (Prunus amygdalus), sugar 
beets (Beta vulgaris), corn (Zea mays), wheat, oats, rye, barley, rice, vegetables, 
ornamentals, and conifers. Vegetables include tomatoes (Lycopersicon esculentum), 

10 lettuce (e.g., Lactuea sativa), green beans (Phaseolus vulgaris), lima beans (Phaseolus 
limensis), peas (Pisum spp.) and members of the genus Cucumis such as cucumber (C. 
sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo). Ornamentals 
include azalea (Rhododendron spp.), hydrangea (Macrophylla hydrangea), hibiscus 
(Hibiscus rosasanensis), roses (Rosa spp.), tulips (Tulipa spp.), daffodils (Narcissus 

15 spp.), petunias (Petunia hybrida), carnation (dianthus caryophyllus), poinsettia 
(Euphorbia pulcherima), and chrysanthemum. Gymnosperms which may be employed 
to carrying out the present invention include conifers, including pines such as loblolly 
pine (Pinus taeda), slash pine (Pinus elliotii), ponderosa pine (Pinus ponderosa), 
lodgepole pine (Pinus contorta), and Monterey pine (Pinus radiata); Douglas-fir 

20 (Pseudotsuga menziesii); Western hemlock (Tsuga canadensis); Sitka spruce (Picea 
glauca)\ redwood (Sequoia sempervirens); true firs such as silver fir (Abies amabilis) 
and balsam fir (Abies 

balsamea); and cedars such as Western red cedar (Thuja plicata) and Alaska 
yellow-cedar (Chamaecyparis nootkatensis). 
25 The examples which follow are set forth to illustrate the present 

invention, and are not to be construed as limiting thereof. 

EXAMPLE 1 
Materials and Methods 

30 NT- J Protoplast Isolation 

One hundred ml cultures of four day old tobacco NT-1 suspension cells were 
spun at 1400 rpm (585xg) for five minutes in a Beckman GPR table top centrifuge rotor 
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GH 3.7 and washed in 10 mM MES (2-[N-morpholino]ethane-sulfonic acid sodium salt, 
Sigma M-3885) pH 5.5, 0.4M mannitol. The pellet was resuspended in 100 ml of 
lOmM MES pH 5.5, 0.4M mannitol containing Ig of cellulase (Onozuka RS Yakult 
Pharmaceutical LTD) and O.lg of pectolyase (Y-23 Seishin Corp.) and incubated for 30 
5 to 60 minutes at 28° C with gentle shaking in order to remove the cell wall. The 
resulting protoplasts were pelleted at 1400 rpm and washed two times in 50ml of cold 
(4°C) 0.4 mannitol (unbuffered). 

NT-1 Nuclei Isolation 

10 The protoplasts were pelleted and resuspended in 50ml of Nuclei Isolation 

Buffer 1 (NIB1) at pH 6.5 (NIB1 = 0.5M hexylene glycol, 20mM N-2- 
hydroxyethylpiperazine-N-ethanesulfonic acid (hepes), 20mM KC1, 1% thiodi glycol, 
50mM spermine (Sigma S-2876), 125 mM spermidine (Sigma S-2501), 0.5mM 
phenylmethylsulfonyl fluoride (PMSF 2M stock in methanol), 2^ig/ml aprotinin (Sigma 

15 A-6279), 0.5% Triton X-100, 0.5mM EDTA). This procedure solubilizes the plasma 
membrane and releases nuclei. After a five minute incubation on ice, the nuclei were 
filtered through a tier of lOOum, 50um, and 30um nylon mesh to remove the cellular 
debris and then spun through 15% Percoll (Pharmacia 1 7-089 1-01)/NIB1 for further 
purification. The pelleted nuclei were washed two times with of Nuclei Isolation Buffer 

20 2 (NIB2 = NIB 1 without EDTA). 

Quantification of Nuclei 

The nuclei were resuspended in a suitable volume of storage buffer (NIB2 in 
50% glycerol) such that the suspension would have an absorbance reading of 10 at 260 
25 nm. Absorbance was determined by diluting 2 jil of nuclei in 0.5 ml of 2.2M sodium 
chloride, 5.5M urea. One ml aliquots were stored at -70°C until needed. The number of 
nuclei per tube was determined by counting aliquots using a hemocytometer. Although 
there was some variation between preparations, in general, each 1ml tube with an 
absorbance of 10 contains about 3.5 million nuclei. 

30 

Preparation of Nuclear Halos and Nuclear Matrices 

Approximately 3.5 million nuclei (one tube stored at -70°C) were thawed on ice 
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and washed in 10ml of Nuclei Isolation Buffer 3 (NIB3 = 0.5M hexylene glycol, 20mM 
hepes pH 7.4, 20mM KC1, 1% thiodiglycol, 50mM spermine, 125 mM spermidine, 
0.5mM PMSF, 2|ig/ml aprotinin). The nuclei were pelleted at 1400 rpm, resuspended 
in 200 |al of NIB 3 containing ImM CuS0 4 and incubated at 42°C for 15 minutes in 
5 order to stabilize the nuclear matrix. 

To remove the histones and other soluble proteins, the nuclei were incubated in 
10ml of Halo Isolation Buffer 2 (HIB2 = lOmM 3,5 diiodosalicylic acid lithium salt 
(Sigma D-3635), lOOmM lithium acetate, 20mM hepes, 2mM EDTA, 0.1% digitonin, 
0.5mM PMSF, 2|ig/ml aprotinin) for 15 minutes at room temperature. (Digitonin is 

10 prepared by mixing 5g in 12.5 ml of methanol, heating to 65°C to dissolve, filtering 
through Whatman #1 filter paper, recrystalizing by removing the methanol under 
vacuum, weighing the resulting crystals and resuspending in water at a concentration of 
5%, and storing at -20°C until needed). When histones are removed the coiling 
restraints on the DNA are removed, allowing the DNA to spill out of the nucleus to form 

15 a 'nuclear halo'. 

The nuclear halos were pelleted at 3600 rpm (2900xg) and then washed with 
10ml of Digestion/Binding Buffer (D/BB = 70mM NaCl, 20mM Tris-HCl pH 8.0, 
20mM KC1, 0.1% digitonin, 1% thiodiglycol, 50mM spermine, 125mM spermidine, 
2|ag/ml aprotinin and 0.5mM PMSF). The second wash contained all the elements of 

20 the first wash, plus lOuM phenanthroline, and the third wash contained all the elements 
of the second was plus lOmM MgCl 2 . The halos were resuspended in 500|Lil of D/BB 
plus all the elements of wash three. 

For a discussion of the above-described methods, see Hall and Spiker, Plant 
Molecular Biology Manual D2: 1-12, Kluwer Academic Publishers, Dordrecht, the 

25 Netherlands. 

To cleave the DNA, nuclear halos were treated with nucleases (either 25 0U of 
Rsal, TaqI, EcoRI, Hindlll, or with Dnasel (Sigma) at O.l^g/ml) and incubated at 37°C 
for 90 minutes with the addition of another 250U of restriction enzymes or 0.1|ag/ml of 
Dnasel after 45 minutes. The resulting nuclear matrices and their associated DNA were 
30 separated from unbound DNA by centrifugation at 3600 rpm for 5 minutes. 
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Isolation of functionally-defined MAR DNA 

Nuclear matrices were washed with 1ml of D/BB and lOmM MgCl 2 to remove 
residual supernatant (unbound) DNA and to remove protease inhibitors. The pellet was 
5 resuspended in 500 jal of protease buffer (lOmM Tris-HCl pH 8.0, 20mM EDTA, 0.5% 
SDS, 0.5mg/ml proteinase K) and incubated at room temperature overnight. The matrix 
bound DNA was further purified by phenolrchloroform extraction and ethanol 
precipitation, dried and resuspended in 100 |*1 of Tris-EDTA (TE = lOmM Tris-HCl 
pH8.0 and lOmM EDTA). 

10 

Cloning 

The purified operationally defined MAR DNA fragments were cloned into 
pBluescript II SK+ (Stratagene). The vector was digested with either EcoRV, Clal, 
EcoRI, Hind III or Hindi (for blunt end ligation of Dnasel generated fragments), and 
1 5 ligated (using New England Biolabs T4 ligase according to the manufacturer's protocol) 
to the purified DNA from the nuclear matrices purified with Rsal, TaqI, EcoRI, HindlH, 
or Dnasel. Stratagene E. coli SURE cells were transformed with the plasmids according 
to the manufacturer's protocol. 

20 Isolation of Plasmid DNA 

Plasmid DNA was isolated from transformants that were grown in 2 ml of Luria 
Broth (10g/l tryptone, 10g/l yeast extract, 5g/l NaCl) with 80 |ig/ml ampicillin overnight 
at 37°C with shaking. The cells were spun at 13000 rpm in a microfiige for 2 minutes, 
and the pellets were resuspended in 150yl of 20% sucrose, 25mM Tris-HCl pH 8.0, 

25 lOmM EDTA. The cells were treated with 350jli1 of lysis buffer (1% SDS and 200 mM 
NaOH) and incubated at room temperature for 10 minutes. After the addition of 250 \x\ 
of 3M sodium acetate pH 5.2, the cells were incubated on ice for 10 minutes and then 
spun at 13000 rpm for 20 minutes at 4°C. The plasmid-DNA containing supernatant 
was transferred to a fresh tube containing 0.7ml of isopropanol and spun at 13000 rpm 

30 for 20 minutes at 4°C. The pellets were washed with 70% ethanol, air dried overnight 
and then resuspended in 50^1 of TE containing 5^g of Rnase A (Sigma), incubated at 
37°C for 1 hour and stored at 4°C until needed. Alternatively, when large quantities of 
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DNA were required, plasmid DNA was isolated using Qiagen columns according to the 
manufacturer's protocol. 

End Labeling Protocol 

5 Plasmid DNAs isolated from individual transformants were end labeled and 

tested for binding to the nuclear matrix. One |ig of plasmid DNA was digested with the 
appropriate enzymes to release the fragment (usually EcoRI and Hindlll) according to 
the manufacturer's protocol. A standard end-labeling reaction contained 250ng of 
digested DNA (5^1), 0.5^1 of 10X Klenow buffer, 0.33^1 of dNTPs (2mM 

10 deoxycytosine triphosphate, 2mM deoxyguanidine triphosphate and 2mM 
deoxythymidine triphosphate), 2.5 |nl of lOmCi/ml a- 32 P deoxyadenosine triphosphate 
(Dupont NEN BLU-012H) and 0.2)li1 of 5,000U/ml DNA polymerase large fragment 
(NEB Klenow) in a total of IOjj.1. The mixture was incubated at room temperature for 
15 minutes and the reaction stopped by the addition of 40jal of TE. The unincorporated 

15 nucleotides were removed by centrifugation through a Sephadex G-50 spin column. 
The amount of radioactivity (counts per minute) in the resulting end-labeled DNA was 
determined by placing 2jxl of labeled DNA in 3 ml of Scinti Verse (Fisher SX 1-4) and 
counted using the Beckman LS 100C scintillation counter. 

20 Matrix Binding-Exogenous Assay 

Nuclear halos were treated with 250U of EcoRI and Hindin for 90 minutes at 
37°C with the addition of another 250U of each enzyme after 45 minutes. The resulting 
nuclear matrices were aliquoted at 50 \il for different binding reactions (different labeled 
DNA fragments for testing). Each 50|il aliquot contained one tenth of the nuclear 

25 matrices, about 350,000, as well as one tenth of the cleaved, non-MAR, endogenous 
DNA, which served as nonspecific competitor. Radioactively labeled DNA fragments 
of interest were incubated with the nuclear matrices at 50,000 cpm per fragment (about 
5ng of DNA) per 50^1 reaction at 37°C for 3 hours with resuspension every 20 minutes. 
The pellet and supernatant fractions were separated by centrifugation at 3600 rpm for 5 

30 minutes. The supernatant was transferred to a fresh tube containing 0.5|al of 0.5M 
EDTA pH 8.0 and stored at -20°C. The matrices were washed with 200 ^il of D/BB 
plus lOmM MgC12 (D/BB = 70mM NaCl, 20mM Tris-HCl pH 8.0, 20mM KC1, 0.1% 
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digitonin, 1% thiodiglycol, 50mM spermine, 125mM spermidine) and then resuspended 
in 50^1 of protease buffer (lOmM Tris-HCl pH 8.0, 20mM EDTA, 0.5% SDS, 
0.5mg/ml proteinase K) and incubated at room temperature overnight. Twenty |il 
aliquots of the pellet and supernatant fractions were subjected to electrophoresis in a 1% 
5 agarose (FMC Sea Kem GTG 50072) gel, prepared and run in TAE (TAE = 40mM Tris- 
acetate pH 8.0 and ImM EDTA). The gel was then treated for 20 minutes in 7% 
trichloroacetic acid, dried and exposed to X-ray film (Kodak X-OMAT AR). This 
method of representing the DNA bound to the nuclear matrix is called the 'equal 
fractions' method, as an equal portion of the DNA from the pellet and the supernatant 
10 fractions (e.g., 20%) is applied to the gel. This approach allows direct determination of 
the amount of a fragment partitioning with the pellet or supernatant; very weak-binding 
DNA fragments are not scored as MARs. 

EXAMPLE 2 

15 Results of Isolation and Testing of Operationally Defined MARs 

A random sample of MAR fragments was obtained by purifying matrix 
associated DNA and cloning these fragments. Five different preparations of nuclear 
matrices were made using one of five different nucleases. From each preparation, 
twenty colonies were picked, grown and analyzed for the presence of single inserts. A 

20 total of thirty-nine clones were then tested for their ability to rebind to the nuclear 
matrix. Since all clones were obtained using one of the five cloning strategies, the 
plasmids obtained using each strategy differ only in the content of their insert. The 
generalized plasmid maps are summarized in Figure 1. 

Each clone was end labeled and tested for the ability to rebind to the nuclear 

25 matrix using the exogenous assay. A previously identified strong binding MAR 
fragment, ToRB7-6 (Hall et al, Proa Natl Acad. Set USA 88:9320 (1991)), served as a 
positive control. In each case, the non-binding vector served as an internal negative 
control. Results of such binding assays are shown in Table 1, which contains a 
summary of the relative binding strength of all the MAR clones tested. Among the 

30 clones obtained from nuclear halos treated with restriction enzymes that have four base 
pair recognition sites (Rsal and TaqI), 9 of 17 fragments had some binding activity as 
compared to 14 of 17 clones when enzymes with six base pair recognition sites were 

18 



WO 99/07866 



PCT/US98/16344 



used. In addition to the 34 clones represented, five clones from the Dnasel treated 
nuclear halos were tested (302, 303, 305, 311 and 319); no binding was detected for any 
of these samples (results not shown). Relative binding strength was based on the 
proportion of each MAR fragment that partitioned in the bound fraction on a scale of 0 
5 to 100%. In Table 1, no = no detectable binding; weak = detectable -40%; medium = 
40-70%; strong = 70-100%. 
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Table 1 



Clones Tested for Matrix Binding Activity* 



Four base cutters 


Six base cutters 


Picric if 

Clone tf 


T^inrlincr cfrp-ncrtVi 
OHKUIlg alldlglll 


Clone # 


Bindine strength 


Rsal : 1 


3coRV 


Hindi! 


I 


1 


Weak 


201 


weak 


2 


Weak 


202 


medium 


3 


No 


203 


no 


4 


Weak 


205 


medium 


6 


no 


206 


weak 


7 


weak 


209 


weak 


8 


no 


211 


strong 


21 


no 


216 


weak 


22 


weak 


210 


no 


23 


no 


217 


weak 


26 


no 


218 


no 


27 


no 


219 


weak 


28 


weak 


220 


medium 


TaqI 


:ClaI 


EcoRI 


11 


No 


109 


weak 


15 


Weak 


113 


weak 


34 


Weak 


115 


medium 


35 


Weak 


116 


strong 



* Relative binding strength is basd on the proportion of each MAR fragment that 
5 partitioned into the bound fraction on a scale of 0 to 100%, where no = no detectable 

binding; weak = detectable - 40% binding; medium = 40% - 70% binding; and strong = 

70% -100% binding. 

Because the fragments were isolated through their association with the nuclear 

matrix (the operational definition of a MAR), all of the fragments would be expected to 
10 rebind to the matrix in the exogenous assay. However, 40% of the clones did not have 

detectable binding activity. There are several possible explanations for this discrepancy. 

One such possibility is a cloning artifact. Some of the DNA fragments may have been 

altered during cloning, resulting in a loss of binding activity. This possibility can be 
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substantiated in at least one case, pS8. In this clone the expected sequence at the 
ligation site is GATAC, but sequencing revealed GATCA. This indicates that the 
fragment was altered during the procedure. Since the clone was created by blunt end 
ligation, it is likely that the fragment was broken during some procedure and quite 
5 possible that the resulting fragment had lost its binding capability. Another possibility 
is that some of the non-MAR DNA was trapped within the nuclear matrix during 
isolation. This is a rare phenomenon, and would not be expected to result in 40% non- 
MAR clones. It is also possible that some of the non-binding clones really are bound to 
the nuclear matrix in vivo but that the sensitivity of the in vitro exogenous assay was not 

10 sufficient to detect weak binding fragments (see Materials and Methods for discussion 
of sensitivity of detection). Another possible explanation for the isolation of 
operationally defined MAR sequences that did not rebind to the nuclear matrix is their 
size. Several of the isolated MAR fragments showed significant reduction in binding 
when cleaved into smaller pieces (data not shown), which supports the hypothesis of a 

15 lower size limit for MAR fragments. The clones ranged in size from 150 to 200 base 
pairs. It is possible that none of the isolated Dnasel clones had binding activity because 
they were below the minimum size requirement. Furthermore, a higher percentage of 
the clones obtained using restriction enzymes with six base pair recognition sites (EcoRI 
and Hindlll) had matrix binding activity (14/17) than those clones obtained using 

20 restriction enzymes with four base pair recognition sites (9/17). Again, this subtle 
discrepancy may be the result of the smaller size of the fragments generated using four 
base cutters. This observation may also play a part in the strength of binding, since all 
of the clones isolated using restriction enzymes with four base pair recognition sites 
were weak binding MARs or had no detectable binding. It should be noted that 

25 although there appears to be a lower limit of size, within a population of fragments of 
the appropriate size (300 bp to several kilobases), there does not appear to be a 
statistically significant correlation between binding strength and the size of the MAR 
fragment (see Figure 5). Finally, some of the non-binding clones may be parts of 
MARs that were cleaved during isolation. It appears that the binding of MAR DNA to 

30 the nuclear matrix may involve extended contact. If a particular sequence is cut within 
the extended contact region, it is possible that none of the resulting fragments would be 
able to rebind to the nuclear matrix in the exogenous assay. 
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EXAMPLE 3 

Materials and Methods for Sequence Analysis of MARs 

Subcloning and Sequencing 
5 To effectively analyze the sequence of the MAR DNA, DNA that is not 

necessary for binding must be excluded. Core binding fragments of the MARs were 
identified by testing the binding of different subfragments of the clones and determining 
the particular fragments that contained the majority of the binding activity. These 
fragments were then cloned and sequenced. 

10 A 998 bp fragment of pSl 16 was excised using Muni and PstI and ligated into 

the EcoRJ and PstI sites of pBluescript II SK+ to create pSl 16-1 .IB (SEQ ID NO:5). 

A 635 bp and a 1087 bp fragment of pS202 were excised by cleavage with 
Hindm and EcoRI and ligated into pBC KS+ (Stratagene) using the same two 
restriction enzymes to create pS202-l (SEQ ID NO:6) and pS202-2 (SEQ ID NO:7), 
15 respectively. 

A 704bp fragment of pS205 was excised with EcoRI and HindlH and ligated 
into the EcoRI and HindlH sites of pBC KS+ to create pS205-2 (SEQ ID NO:8). 

A 306 bp fragment of pS206 was excised with BamHI and Hindlll and 

ligated into the BamHI and HindlH sites of pBC KS+ to create pS206-l (SEQ ID 
20 NO:9). 

A 685bp fragment of pS211 was excised with EcoRI and HindlH and ligated 
into the EcoRI and Hindffl sites of pBC KS+ to create pS21 1-1 (SEQ ID NO:10). 

A 899bp fragment of pS217 was excised with BamHI and ligated into the 
BamHI site of pBC KS+ to create pS217-l (SEQ ID NO:ll). 

25 A 1499 bp fragment of pS220 was excised with Xhol and HindlH and ligated 

into the Xhol and HindlH sites of pBC KS+ to create pS220-l (SEQ ID NO:13). 

All digestions, ligations and transformations were performed according to 
manufacturer protocols. The maps of the original plasmids and their subclones are 
depicted in Figure 2. 
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The MAR fragments were sequenced in stages by primer walking. Each clone 
was sequenced using the Universal-21M13 (TGTAAAACGA CGGCCAGT)(SEQ ID 
NO:18) and reverse M13 (CAGGAAACCGA TATGACC)(SEQ ID NO:19) primers at 
Iowa State University Nucleic Acids Facility for the initial portions of the sequence. In 
5 the case of longer clones, when a complete sequence was not obtained from the initial 
sequence, internal primers were designed and then constructed at the Molecular 
Genetics Facility at North Carolina State University. All interior primers are underlined 
in Figure 3. Because of the AT-rich nature of the MARs, primers with suitable melting 
temperatures, but that avoided secondary structural features, had to be designed. The 

10 usefulness of each primer was tested by attempting to amplify an internal MAR 
fragment using the constructed primer in conjunction with either the universal or reverse 
primer in a polymerase chain reaction (PCR). PCR was performed using Boehringer 
Mannheim Taq polymerase according to the manufacturer's protocol. Primers that 
resulted in successful amplification of DNA were used for additional sequencing at Iowa 

15 State University. Sequences of the primers used are underlined within the sequences in 
Figure 3. 



EXAMPLE 4 
MAR Sequences 

20 From the randomly isolated MAR fragments, a sample of ten binding clones 

(pSl, pS4, pS115, pS116, pS202, pS205, pS206, pS211, pS217 and pS220) and two 
non-binding clones (pS8 and pS218) were chosen for sequence analysis. These 
particular MARs were chosen as representatives of the population based on binding 
strength. Within this population of ten sequences are several weak, medium and strong 

25 binding MARs, representing the spectrum of binding strengths within population. The 
binding of MAR fragments appears to involve multiple protein DNA interactions rather 
than a single binding site (Gasser et al., 1989), however, it is not known if multiple 
interactions are required for MAR binding, or if longer MAR fragments simply consist 
of many smaller MARs each acting independently. The binding of a particular MAR 

30 may be affected when that fragment is cleaved so that shorter fragments may contain 
none, some or all of the binding potential of the original full-length clone. In some 
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instances, all of the fragments are capable of binding to the nuclear matrix, whereas in 
others the binding is confined to one of the smaller fragments. 

Several of the isolated clones were several kilobases in length. To avoid 
including non-MAR DNA in our sequence analysis, these samples were digested into 
5 smaller fragments and the binding of these subfragments tested. In the case of seven 
clones (116, 202, 205, 206, 211, 217 and 220) a smaller core binding fragment that 
maintained the most of the original binding strength was identified. These core binding 
fragments were subcloned and used instead of the original sequence (see Figure 2 for 
plasmid maps). In the case of clone pS202, two binding subfragments (pS202-l (SEQ 

10 ID NO:6) and pS202-2 (SEQ ID NO:7)) were identified, each of which maintained a 
binding strength similar to that of the original clone. Both fragments are included in this 
study. The other three clones (pSl, pS4, and pSl 15) were sequenced in their entirety 
(SEQ ID NO:l, SEQ ID NO:2, and SEQ ID NO:4, respectively). Two of the three non- 
binding clones (pS8 and pS218) were sequenced in their entirety (SEQ ID NO:3 and 

15 SEQ ID NO:12, respectively). The binding strengths of the subclones were rated on a 
scale of 0-100 based on the percent of the MAR that partitioned in the bound fraction in 
the standard exogenous assay (Figure 4). 



EXAMPLE 5 

20 Motif Significance in MAR Sequences 

Several different motifs have previously been identified as associated with MAR 
sequences, including A boxes (SEQ ID NO:14), T boxes (SEQ ID NO: 15), ARS 
consensus (SEQ ID NO: 16), and the Drosophila consensus sequence for topoisomerase 
II (SEQ ID NO: 17). A search was conducted for the presence of these motifs using the 

25 Apple Macintosh program MacVector. In the search, 1 mismatch was allowed in the 
cases of the ARS consensus and T box, and 2 mismatches in the cases of A box and the 
consensus for topoisomerase EL Allowing for this number of mismatches results in 
similar probabilities of occurrence for all four motifs (see Table 2). The A box and T 
box motifs often resulted in overlapping regions due to their AT-rich nature. For 

30 example, within a stretch of 20 bases of Ts, 1 0 T boxes can be found. The probability of 
finding an additional T box upon inclusion of the next base is 0.35 in a region of DNA 

24 
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that contains 70% AT. To avoid this artificial identification of additional motifs, only 
two T boxes would be counted in the above example. 

In the MAR DNA fragments obtained from the random cloning procedures 
described in the above Examples, AT rich motifs were present. These AT rich regions 
5 are depicted in Figure 4. Additionally, several of the randomly obtained MAR 
sequences contained short stretches (20bp) of highly AT rich (> 90%) DNA. The 
locations of these stretches are also shown in Figure 4. 

The present inventors tested the occurrence of each of these motifs for statistical 
significance against the number of occurrences that would be expected at random, to 

10 determine if these motifs are over-represented in MAR DNA. The probability of the 
presence of a specific sequence starting at a specific base within 70% AT rich DNA was 
calculated by multiplying the probabilities of each base in the motif (0.35 for A and T, 
0.15 for G and C). Since mismatches of 1 or 2 bases were allowed for certain motifs, 
the probabilities were adjusted by dividing the calculated probability of occurrence of 

15 the motif with no mismatches by the lowest probability of an individual base (or bases 
when two mismatches were allowed) within the motif, yielding a conservative estimate 
of probability. In addition, the probabilities were multiplied by a factor of two, since 
these motifs can occur in either strand of the DNA. Although the two strands of a DNA 
sequence are not independent, the factor of two provides a simple and conservative 

20 method for calculating the expected frequency within double stranded DNA. For 
example, the probability of the topoisomerase II consensus sequence 
(GTNWAYATTNATNNR) occurring in 70% AT rich DNA without any mismatches is 
calculated by multiplying the expected frequency of each base together; 
[(0.15X035X1X0.7X035X0^ 

25 1.68xl0' 5 . 

This calculated probability is then adjusted for the allowance of up to two 
mismatches by dividing by 0.15 and 035, the two lowest expected frequencies of each 
base. The resulting value is then multiplied by two, since- motifs can occur in either 
strand of DNA. The calculated probability for an A box with two mismatches, an ARS 
30 consensus with 1 mismatch, and the consensus for topoisomerase II with two 
mismatches are coincidentally identical, 6.428 x 10" 4 , whereas the calculated probability 
of a T box occurring with one mismatch is slightly higher, 1.26 x 10'\ The probability 

25 



WO 99/07866 



PCT/US98/16344 



of the occurrence of a 20 bp stretch of DNA containing 90% or greater AT content, was 
calculated by taking the probability of a single base being either A or T (0.7) to the 18 th 
power. Therefore, the probability of occurrence of a twenty base pair stretch with two 
mismatches (or 90% AT rich DNA) is equal to 1.628 x 10°. Since this type of motif 

5 will automatically be present on both strands at the same time, only those motifs found 
in one strand were counted. As with the other motifs, overlapping regions were not 
counted as separate occurrences, however, this motif often occurs in stretches much 
longer than 20 bp. The actual lengths of the regions are depicted in Figure 4. The 
number of occurrences were counted as one from 20-39 bp; two from 40-59 bp; and so 

10 on. The probability of a motif starting at any one site, was converted to an expected 
number of occurrences within a DNA sequence by multiplying by the number of base 
pairs in its length. 

For purposes of the calculations the probability of a particular base occurring at a 
particular site was assumed to follow a multinomial distribution, i.e., any of the four 

15 bases can occur at a particular site. Note that this assumption may not be true if there 
are restraints on DNA sequences (e.g., five consecutive Gs not allowed), but since such 
restraints are presently unknown, the assumption of multinomial distribution was made. 
It would be expected that the probability of a particular string of bases (a motif) 
occurring at a particular site to have approximately a normal distribution. The observed 

20 number of MAR motifs were compared to the number expected under this normal 
distribution assumption. The number of observed MAR motifs was significantly greater 
(at the 1% level) than the number expected if the observed number of MAR motifs is 
greater than a calculated critical value. The critical value is |a+Z 0 m a (Steele and Torrie, 
Principles and Procedures of Statistics: A Biometrical Approach, McGraw-Hill 

25 Publishing Co., New York, NY (1980)). In this equation \i is the expected number of 
MAR motifs, Z a01 is the Z statistic at 1% and or is the standard deviation of the expected 
number of occurrences of MAR motifs given a random expected number of occurrences 
given a random sequence of bases (a = V The motifs in question have already been 
shown to be associated with certain MAR sequences. The null hypothesis of the motifs 

30 occurring at random is compared to the one tailed test of the alternative hypothesis, that 
these motifs occur more often than would be expected by random occurrence. The 1% 
critical values are shown in Table 2. Since the observed number of occurrences for all 
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five motifs is greater than the critical values, the null hypothesis can be rejected in favor 
of the alternative hypothesis that these motifs occur more often in MAR DNA than 
would be expected by chance alone. A boxes, T boxes, ARS consensus, topoisomerase 
II consensus and 20 bp regions of > 90% AT DNA all occur more often than would be 
5 expected at random. 
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EXAMPLE 5 

Motif Frequency and Binding Strength 

To determine if a correlation exists between the number of these motifs present 
5 within a particular MAR sequence and its binding strength, that data (Figures 6-1 1) was 
plotted and tested the significance of the regression coefficient, r, using an F test (Steele 
and Torrie, 1980). Included were all eleven binding sequences (SEQ ID NOs: 1, 2, 4- 
1 1, and 13); two non-binding sequences (SEQ ID NO:3 and SEQ ID NO: 12); and two 
well characterized MARs (ToRB7-6 and ARS-1). 

10 The correlation between binding strength and the length of the MAR as well as 

the %AT content of MARs was analyzed. The analysis (Table 3), shows a significant 
correlation between the number of 20 bp stretches of 90% or greater AT and the binding 
strength of MARs. The number of T boxes is also significant, as is the number of A 
boxes. No significant correlation between binding strength and the length of the MAR 

1 5 fragment nor the presence of any of the other MAR related motifs was detected in this 
analysis. However, there was a significant relationship between binding strength and 
overall AT content. This is the first report of a correlation between the abundance of 
certain MAR related motifs and MAR binding strength. In addition, the newly 
identified MAR related motif of local AT rich regions (>90%), has a higher correlation 

20 to MAR binding strength than any of the previously identified motifs. 
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TABLE 3 



iVlOlll 


R 


F calc 


significance 


%AT 


0.77 


18.93 


** 


length 


0.38 


2.19 


ns 


#A box/kbp 


0.19 


0.49 


ns 


total #A box 


0.57 


6.48 


* 


# ARS/kbp 


0.29 


1.27 


ns 


total # ARS box 


0.35 


1.85 


ns 


#T box/kbp 


0.64 


9.45 


** 


total # T box 


0.76 


17.98 


** 


# TopoII/kbp 


0.06 


0.04 


ns 


total # TopoII 


0.23 


0.07 


ns 


#90%AT/kbp 


0.69 


12.07 


** 


total #90% AT 


0.80 


24.62 


** 



r = regression coefficient = [vXY-fvXvYVnl 
5 (lX 2 -[( L X) 2 /n])( E Y 2 -[(5:Y) 2 /n]) . 

Fcalc = F statistic = (n^HrVl-r 2 )] 
ns = not significant 
* = significant at 95% 
10 ** = significant at 99% 

The relationship between certain MAR related motifs and MAR binding strength 
suggests that these motifs are general components of MARs. The lack of this 
relationship for other motifs suggests that these sequences are over-represented without 
15 being associated with MAR function, or may be related to only certain categories of 
MARs. In addition several other MAR related motifs (such as the asymmetric GA-rich 
stretches, or homopurine stretches) are found within some, but not all of the randomly 
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obtained sequences. The presence of such a motif may indicate a specific class of 
MARs, but since there is no information about the location of these random sequences 
within the genome, it cannot be inferred from the presence of this motif that a sequence 
is a MAR with a specific function (or even a MAR at all). 

5 MARs may interact with the nuclear matrix through a variety of secondary 

structure motifs including a narrow minor groove, transiently single stranded regions 
and bent DNA. These structural motifs are expected to be present in the MARs 
described here because of their high AT content. A narrow minor groove is a feature of 
DNA containing long A tracts, a predicted feature within high AT content DNA. 

10 

EXAMPLE 5 
Increasing Average Expression Levels 



Earlier studies (Allen et al., Plant Cell 5:603 (1993)) showed that flanking a 

1 5 GUS reporter gene with two copies of a yeast MAR element (ARS-1) increased average 
GUS expression by 12-fold in stably transformed cell lines. In the present Example, the 
same cell line is transformed with constructs similar to those of Allen et al., 1993, but 
using a MAR having a sequence selected from SEQ ID NOs:l, 2, 4-1 1 and 13. 

Transformation is achieved by mixing the appropriate reporter test plasmid and a 

20 selection plasmid, co-precipitating them onto microprojectiles, and bombarding plates of 
tobacco suspension culture cells as described previously (Allen et al., 1993). Antibiotic- 
resistant microcalli are selected and each callus is used to start an independent 
suspension culture cell line. Histochemical staining of segments from the original 
microcalli show that the staining intensity is greater in cell lines transformed with MAR 

25 plasmids. After several weeks of growth, with weekly transfers, suspension cells are 
harvested, DNA is extracted from each cell line for Southern analysis and quantitative 
PCR assays, and portions of the same cell population are used to measure extractable 
reporter activity and expressed protein levels. Transgene copy number estimates and 
expression data are calculated. Levels of GUS gene expression, measured as GUS 

30 enzyme activity, are assessed and compared to controls. 

The foregoing examples are illustrative of the present invention, and are not to be 
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construed as limiting thereof. The invention is described by the following claims, with 
equivalents of the claims to be included therein. 
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THAT WHICH IS CLAIMED IS: 

1. An isolated DNA molecule having a nucleotide sequence selected from 
the group consisting of: 

5 (a) SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:5, SEQ ID 

NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO: 1 1 and 
SEQIDNO:13;and 

(b) sequences that hybridize to isolated DNA of (a) above under 
conditions represented by a wash stringency of 0.3M NaCl, 0.03M sodium citrate, and 

10 0. 1 % SDS at 60°C, and which encode a matrix attachment region. 

2. A DNA construct comprising: 

(a) a transcription initiation region and a structural gene positioned 
downstream from said transcription initiation region and operatively associated 

15 therewith; and 

(b) a matrix attachment region according to claim 1 positioned either 5' 
to said transcription initiation region or 3' to said structural gene. 



3. A DNA construct according to claim 2, wherein said matrix 
20 attachment region is 5 ' to said transcription initiation region. 

4. A DNA construct according to claim 2, wherein said matrix 
attachment region is 3' to said structural gene. 

25 5. A DNA construct according to claim 2, further comprising a second 

matrix attachment region that differs in sequence from said matrix attachment region 
according to claim 1. 

6. A DNA construct comprising: 
30 (a) a transcription initiation region and a structural gene positioned 

downstream from said transcription initiation region and operatively associated 
therewith; 
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(b) a matrix attachment region according to claim 1 positioned either 5' 
to said transcription initiation region or 3' to said structural gene; and 

(c) a second matrix attachment region according to Claim 1, wherein 
said second matrix attachment region is positioned either 5' to said transcription 

5 initiation region or 3' to said structural gene. 

7. A DNA construct according to claim 2, further comprising a 
termination sequence positioned downstream from said structural gene and operatively 
associated therewith. 

10 

8. A DNA construct according to claim 2, wherein said first and said 
second matrix attachment regions differ in sequence. 

9. A vector comprising a DNA construct according to claim 2. 

15 

10. A vector according to claim 9, wherein said vector is selected from 
the group consisting of plasmids, viruses, and plant transformation vectors. 

1 1 . A host cell containing a DNA construct according to claim 2. 

20 

12. A host cell according to claim 9, wherein said host cell is an 
animal cell or a plant cell. 

13. A transgenic plant comprising transformed plant cells, said 
25 transformed plant cells containing a DNA construct according to claim 2. 

14. A transgenic plant according to claim 13, which is a monocot. 

15. A transgenic plant according to claim 13, which is a dicot. 

30 

16. A transgenic plant according to claim 13, which plant is a dicot 
selected from the group consisting of tobacco, potato, soybean, peanuts, cotton, and 
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vegetable crops. 

17. A DNA construct comprising, in the 5 ! to 3* direction, a 
transcription initiation region, a structural gene positioned downstream from said 

5 transcription initiation region and operatively associated therewith, and a matrix 
attachment region positioned either 5' to said transcription initiation region or 3' to said 
structural gene, wherein said matrix attachment region has a sequence selected from 
SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID 
NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:ll and SEQ ID 
10 NO:13; 

said DNA construct carried by a plant transformation vector. 

18. A DNA construct according to claim 17, further comprising a 
second matrix attachment region that differs in sequence from said matrix attachment 

15 region. 

19. A recombinant tobacco plant comprising transformed tobacco 
plant cells, said transformed tobacco plant cells containing a heterologous DNA 
construct comprising, in the 5' to 3' direction, a transcription initiation region functional 

20 in plant cells, a structural gene positioned downstream from said transcription initiation 
region and operatively associated therewith, and a matrix attachment region positioned 
either 5 1 to said transcription initiation region or 3' to said structural gene, 

wherein said matrix attachment region has a sequence selected from 
SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID 

25 NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:l 1 and SEQ ID 
NO: 13. 

20. A method of identifying matrix attachment regions in a DNA 
molecule of known nucleotide sequence, comprising identifying a sequence section of at 

30 least twenty contiguous nucleotides that is at least 90% A or T nucleotides, wherein the 
presence of such a sequence section indicates a MAR encompassing said sequence 
section. 
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21. A method according to claim 20, further comprising preparing a 
MAR molecule of at least about 300 nucleotides, said MAR having a sequence which is 
a contiguous fragment of said DNA molecule sequence and which encompasses said 
5 identified sequence section of at least twenty contiguous nucleotides that is at least 90% 
A or T nucleotides. 
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FIG. IB. 




T7 promoter 
Kpnl 



Apal 
Xhol 



SallfHincH) 
Hindm 
EcoRv 
EcoRI 

EcoRI 

PstI 

Smal 

BamHI 

Spel 

Xbal 

NotI 

SacH 

BstXI 

SacI 

T3 promoter 




T7 promoter 
Kpnl 
Apal 
Xhol 

Sail (Hindi) 
Hindm 



Hindm 

EcoRV 

EcoRI 

PstI 

Smal 

BamHI 

Spel 

Xbal 

NotI 

SacH 

BstXI 

SacI 

T3 promoter 



FIG. 1C. 



FIG. ID. 



SUBSTITUTE SHEET (RULE 26) 



WO 99/07866 



2/18 



PCT/US98/16344 



FIG. IE 



T3 promoter 
SacI 

Bstx: 

Saci: 
Not 
Xba 
Spe 
BamH 
Sma 
Pst 
EcoR 
EcoRV 
Hindffl 





T7 promoter 
Kpnl 
Apal 

xhoi _ 

SalI(HincII) 
EcoRI 
Xbal 



Xbal 
'EcoRI ^ CM 

FIG. IF. 



T7 promoter 

Kpnl 

Apal 

Xhol 

SaU 

HincII:DNaseI 

DNaseLHincII 
Hindm 
EcoRV 
EcoRI 
PstI 
Smal 
Bamffl 
Spel 



Spei 
Xbal 



NotI 
SacII 
BstXI 
SacI 

T3 promoter 




FIG. 2A 



T7 promoter 
Kpnl 
Apal 

Xhol _ 
SalI(HincIT) 

EcoRV:RsaI 

RsaI:EcoRV 
EcoRI 
PstI 
Smal 
BamHI 
Spel 
Xbal 
NotI 
SacH 
BstXI 
SacI 

T3 promoter 




FIG. 2B 



T7 promoter 
Kpnl 
Apal 
Xhol 

Sail (Hindi) 
Hindm 
EcoRV:RsaI 
RsaI:EcoRV 
Spel 
EcoRI 

EcoRI 
PstI 
Smal 
BamHI 
Spel 
Xbal 
NotI 
SacII 
BstXI 
SacI 

T3 promoter 



SUBSTITUTE SHEET (RULE 26) 



WO 99/07866 



3/18 



PCT/US98/16344 




T7 promoter 
Kpnl 
Apal 
Xhol 

SalI(HincIT) 
Hindlll 
EcoRV:RsaI 
RsaI:EcoRV 

EcoRI 
PstI 
Smal 
Bamffl 
Spel 
Xbal 
NotI 
SacII 
BstXI 
SacI 

T3 promoter 




17 promoter 

Kpnl 

Apal 

xhoi 

SaUfHincB) 
Hindm 
EcoRV 
EcoRI 



PstI 
Smal 
Bamffl 
Spel 
Xbal 
Nbtl 
SacII 
BstXI 
SacI 

T3 promoter 




SUBSTITUTE SHEET (RULE 26) 



WO 99/07866 



4/18 



PCT/US98/16344 



FIG. 2G. 



BstXi 



T3 promoter 
SacI 
BstXI 
SacII 
NotI 
Xbal 
Spel 
Bamffl 
Smal 
PstI 
EcoRI 
EcoRV 
Hindm 




T7 promoter 
Kpnl 
Apal 

xhoi 

SaU(HincII) 
dm 



FIG. 21. 




T7 promoter 

SacI 

BstXI 

SacII 

NotI 

Xbal 
Spel 



FIG. 2H. 



Smal 
PstI 

SacII 

Hindm 
SaUfHincII) 
Xhol 
Apal 
Kpnl 

13 promoter 




T7 promoter 
SacI 
BstXI 
SacII 
NotI 
Xbal 
Spel 
Bamffl 
Smal 
PstI 
EcoRI 
EcoRV 
Hindm 

Hindm 
Sail (HincII) 
Xhol 
Apal 
Kpnl 

T3 promoter 



T3 promoter 
* Sac 
BstX 
SacII 

Not: 
Xba;: 

Spel 

Bamffl 
Sma 
Pst 
EcoRI 
EcoRV 
Hindm 




T7 promoter 
Kpnl 
Apal 
Xhol 

SalI(HincII) 
Hindm 



BstXI 
Spel 

FIG. 2J. 



SUBSTITUTE SHEET (RULE 26) 



WO 99/07866 

5/18 



FIG. 2K. 



T3 promoter 
* Sac 
BstX 
SacH 

Not: 

Xbal 
Spe. 
Bamffl 
Sma 
Pst 
EcoR 
EcoRV 
Hindffl 




T7 promoter 
Kpnl 
Apal 
Xhol 

SaU(HincII) 
HindDl 



EcoRI 



FIG. 2L 



FIG. 2M. 



T3 promoter 
* Sac 
BstX 
SacH 
Not: 
Xbal 
Spe: 

BamH 
Sma 
Pst 
EcoR 
EcoRV 
HindDl 




T7 promoter 
Kpnl 
Apal 
Xhol 

SaUOIincn) 
Hindm 



BamHI 




FIG. 2N. 



SUBSTITUTE SHEET (RULE 26) 



WO 99/07866 



PCT/US98/16344 



6/18 



FIG. 20. 



T3 promoter 
Sad 
BstXI 
SacII 
NotI 
Xbal 
Spel 
BamHI 
Smal 
PstI 
EcoRI 
EcoRV 
Hindm 




T7 promoter 
Kpnl 
Apal 
Xhol 

SaU(HincH) 
Hindm 



EcoRI 
BamHI 




FIG. 2P. 



T7 promoter 

Sacl 

BstXI 

SacII 

NotI 

Xbal 

Spel 

BamHI 

Smal 

PstI 

EcoRI 

Hindm 
SaU(HincH) 

Xhol 
Apal 
Kpnl 

T3 promoter 



FIG. 2Q. 



T7 promoter 
Kpnl 



T3 promoter 
Sac 
BstX 
SacII 
Not 
Xba 
Spe7 

BamH 
Sma 
Pst 
EcoR: 
EcoRV 
Hindm 




17 promoter 
Sacl 
BstXI 
SacE 
NotI 
Xbal 
Spel 
BamHI 
Smal 
PstI 
EcoRI 
EcoRV 
Hindm 

Smal 
PstI 
EcoRI 
EcoRV 
Hindm 
SalI(Hincn) 
Xhol 
Apal 
Kpnl 

T3 promoter 



SUBSTITUTE SHEET (RULE 26) 



WO 99/07866 



PCT/US98/16344 



7/18 



Xbal 




T7 promoter 

Kpnl 

Apal 

Xhol 

Not! 

SaUfHincH) 
Hindm 



FIG. 2S 



EcoRV 
EcoRI 
PstI 
Smal 
BamHI 
Spel 
Xbal 
NotI 
SacII 
BstXI 
SacI 

T3 promoter 



T3 promoter 
SacI 
BstXI 
SacII 
NotI 
Xbal 
Spel 
BamHI 
Smal 
PstI 
EcoRI 
EcoRV 
Hindm 




FIG. 2U. 




T7 promoter 

SacI 

BstXI 

SacII 

NotI 

Xbal 

Spel 

BamHI 

Smal 

PstI 

m, 

Hindm 



T7 promoter 
Kpnl 
Apal 
Xhol 

SalI(HincH) 
Hindm 



Xbal 
Clal 



Spel 
Sail 



EcoRI 
Xhol 



FIG. 2T. 



Xhol 
Apal 
Kpnl 

13 promoter 



SUBSTITUTE SHEET (RULE 26) 



WO 99/07866 PCT/US98/16344 

8/18 



FIG. 2V. 
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r'$lG to we-- 1 

1 GATACGTAAA 

5 1 GT AGAGACGA 

101 ACTGAAATAA 

151 ATAATTTATT 

201 TCTCAATATA 

251 AATTAAACCT 

301 TAATAATTCA 

351 AACCTGAAAT 

401 ATTTCTGCCG 



CAACGTGTAT 
GATGACCGAC 
AACTAAGATA 
TAATCAGCAG 
TTAATTAAAT 
CATTTACAGG 
TTAAATTCCA 
AAATTATTAA 
AGGACATACG 



CCAGTAAGTA 
TTTGACACTC 
TTTAAACCAA 
AAATAATCAA 
TCCTTCAATT 
AGTAACAATT 
AGGATTTTTC 
AGTATCGTGT 
GCCCGATCCA 



7CAAGCCTAA 
ACTATGGGTC 
CATGATTTAC 
ATTTCTTCAA 
CAAATAATTT 
AATTCCTTAA 
AATTTATTAA 
AATTATTATT 
GAGTATC 



TCTCGAAGTG 
AATAATAATA 
AGAATTTACA 
ATGTAACAAT 
CTAATTTATC 
CAAGCAAGAA 
TTAGCTTCAC 
ATTAAGCACG 



(SiQ »t> WC 2.J 

1 GATACTAGAG TGGTGTTATC 

51 GTCTCTTATT TCTGTCCTAA 

101 TTTCAATCCC TATGACATAG 

151 AACAAAAGAT AATTTTTGTG 

201 TTTCTCATGA ATTTTTTAGA 

251 TftGAC GTTTT ATAAAAAATA 

301 TATTCCTATA AAACTTCTGG 

351 GGTATCGGTA TAATTTTTTT 

401 AAATTTCTTT TTCCAAATTT 

451 CAAAAAGAAA AATAAGCTAT 

501 CTTTGACTAG TAGGAAGCCA 

551 AACGTGGGCA TATCTCCAAA 



AATTCTTACT CGTATGAATT AATTAAATTT 
GTCATATACA AGAAATGCTA ACTCCATCCG 
TTTGATTTGA TTGAATTTGA AAATTTAAGA 
ACTCATAATT TAGACATGTG TTATAAGACT 
AACAAATGAT AATTTTTSGA fiCTCATAATT, 
CTAACTGCAT CTGGTTCAAT ATTTATGTGT 
ACTTATATTT TTAAATATTT CATAATATTT 
GTCACTTTTG. ffaTGAAAGGG AAGTTTAAGT 
AGAAAGTTAT AATATTCTTT TTAAAACGCC 
TGATTATTAT AAGCCTAAAC CAAAAGAATT 
TTTTTAAGTT AGGCGCCAAA ATTCAAAGCC 
CTGGCGGCTA CAGTATC 



fl>'£dl ID WO- 2-) 

1 ACCGCTTTTA TTATTATTAT 

51 ACATCTCGAA CCACGTCACA 

101 AACTCTGTTG AGATTTGGAT 

151 AGAGGATAAC ATTATTAAAT 

201 TTTGGGTAGG GCCGTGAAAT 

251 TAATTTTACC AACACGTATA 

301 GTCGAGGCTC GTCTCATTCA 

351 GAAATGCATC TCGAACCACG 



TTTTACCGAG AATTACAACA TCATGAAAAT 
TCAATGCACC CGCGGTTATT GACATATTTC 
TTGGGTCACA TAAATGTGCA CCCGAGTTTA 
ACGCGCCTAA AACGACTAGC GTATCATTAT 
TTTGCTAAAC TGCCCATCCA GAAATCTAAG 
GAGGGCCCCA CAGCTTGTGT ATTTTTGTTT 
TTATTTTTAA AAGGAATTTG CAACGTCGTG 
TCACAATCAA TGA 



R<? ■ 3 
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i raVTTCGATA GA.CTCACTTA AATATTAGAA GTGAATTACC TAGAGTTAGA 

Si TCC^ScS TT^TCTTGCA CCTATCCTAT CAACCCTTAT CTTTTCCCAT 

101 TGATTACTAC CTTGCTTACC TTTGTTACGA T,TT TC ATTftg ACftftTAftCT T 

\li ?aga?tSta gttaattgca gttagaaatt atattaaatt tcaattgttg 

la\ GATCATCTTG AATACCAATC AAGCTAGAAA ATACAAGAAT ACT GTT T AAA 

111 ?CAAA?CCAT GTGGATACGA TATTATACTA TATTATATTT GACTTGTGAG 

\l\ CATTATTTAT GTGTGTTTTG TGCTCGTCAA AGTTTGGCGT CGTTGCCGAG 

G^S SmgCiai T.GAAATAGTT TTTGGTGCTA ATTTAGGAAT 

' , TnrrTTTTAT TTATTTATTT TTTCTTTTCT TTTCTTTTTC CTlTIClAli 

??A?ScCT? TATTAGTTAA CTTCTTTTCA AGATTTTTTT TGTAGTACCT 

o ScaagtSg agIIgatact gtagattttg aactctaaat gttgtgaaga 

551 TGGAGTACAA CCAGCCTAAG AAAATATTTG AATAGTTAGC JGCTGAACAT 

^7 rrs^rvrrrrrrv rrrrTATGCG GTTTAAATGC GGTGGAAGCA TCTACCACLb 

Si SgcSSSg g^agttag cagcttgaac attatcggcg 

7 6 "™ TGTTTTAAAT GCGGTGGAAA TGATGTACSS g T CTCAGTCT 

751 MSCAGGTAT GTATTCTTCC TATGGTTCGT ATTTTGAGGA JTCTCACTCT 

8 01 GTTTCTAGTT CGTACATGTA TGAGGATTCA TATGGGCACA ACTCTGACTC 

8 51 TGGTTGGGAT GAATTC 



DS116-1.1B 

1 GAATTGTATT 

51 TCAGCGACAA 

101 AATACCTCTC 

151 TGGGTTCAAA 

2 01 TGTTGGTGAA 

251 ATATCCG AAA 

301 AACTGAAATT 

351 TCAGTTTTTA 

401 AAACAATGTT 

4 51 CTAAAGCAAT 

501 AAATGAAAAT 

551 AAAGCTGAAA 

601 AATATTTTCT 

651 CCAGTTTTTA 

7 01 TTACAAAAAT 
751 TTTTTTGTAA 

8 01 TAAAATTTGT 
8 51 TAGAAAATTA 
901 AGATATTTTG 
951 ATGAGGTGTA 



ATTGTTAGGT 

AGGGCCAAAT 

GTTATATTAT 

TATACCCCTC 

TTCTAAATAT 

AATATTTTTT 

ATTTTTACTA 

CAAAAAAACT 

TTTGTAAAAA 

TTTATTTGTA 

ATTTTTTTTT 

ATATTTTCTA 

TCTTTTTTTC 

ATTACTTTAG 

ATTATTTTAG 

AAACTTGAAA 

TTTTAGTTTT 

TTTTT CGGGT 

AATTGATCAA 

TATTTGAATC 



GGGAGAGATT 

ATACCTATTT 

TAGGTTATCT 

ATTTAAACGG 

CTCCTAATTA 

AAAGCAATAT 

AAAATTGAAA 

ATTTTAGAAA 

CTGAAAAAAA 

AAATCTGGAA 

CTAATTTTTA 

AAACAATATT 

AGTTTTTAGT 

AAAATTACTT 

AAAATATTTT 

AACAATATTT 

TTTCAGTTTT 

fiTffftflTAATG 

TAGGACGATG 

CAAAGTAAGA 



TTTGACTATA 
ACTTTTAAAA 
ATACCTTTGC 
AGGGAGACGT 
ATTAAAAAGA 
TTTTTTATAA 
AAAAC G AAA A 
AAATTGAAAA 
AGAAGCTGAA 
AAAACTACTA 
CAAA-AAAAAC 
TTTGTAAAAA 
T AJLAAAT AT T 
TTCTGCTTTT 
TCAGTTCTTT 
TCGTTTTTTT 
TACCAAAAAT 
££TCTTTTTA 
ACACATGTCC 
CTGCAGCCCG 



TGGGTTAAAA 

ATAGTCTAAT 

A.GTCATATTT 

GTCATCGTCC 

PTPATTACCC 

AAAATGGAAA 

TAGTTTTTTT 

ATATTTTCTA 

AATCAATTTT 

AAAACTGAAA 

TGCTTTAAAA 

CTAAAAAAAA 

TAAGTTTTTT 

TTTTCAGTTT 

AAAGCAGTTT 

CAGTTTTTAG 

AAAATTGCTT 

ATTAATTAGG 

CTCCGTTTAA 

GGGGATCC 
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cS202- 






-("$13 vO 




1 


GAATTCGATA 


TGGCTTGTTG 


51 


GTTGATGGAC 


ATATTGAAGG 


101 


TTTTAAATGT 


TCCCCAATTA 


151 


TPAACCTTAG 


ATCAACAAAC 


201 


GGCATTATGG 


CTTGAAGAAA 


251 


GAATTTATAC 


CCACAGTAAT 


301 


TGTTACGATC 


CGTTGCAGTA 


351 


ATGACATTGT 


GGAATTAAAA 


401 


GTAGAGCTTA 


CTGCGAACAT 


451 


TCAGTTGATG GATTCCTTGA 


501 


ACGAAAAAGA 


GATACAGTGT 


551 


AACTAAGAAA 


TAATGAAACA 


601 


CAACAATTTA 


TAGTAGATAT 



G-CAAGAATT AATGAATCAA TTGTGAAAAA 
TAAAATCATA TACTATTTTT CTAAAATCTC 
TCTGATTTCT ATATTGCTCT TAAATGTCAC 
ATATAACTTA CCCAGTACAT AAGAGATTGC 
ATCCTAGAGA CACATCTGCA CCACATATTT 
AGAGCTCGGT TAGTACATTA TTATTATGGA 
TCCATTATTA TTTTCCTTCG GTGAAAATGG 
AAATTATTCA GACAAAAAAT TCGACGAAAC 
GAACAATTGC CCAGTATATC AAATACGTGT 
TATGGAAGAT GAATCACTAC AAAGAGGAAA 
CTTGTCGAGA GTATTATTGT TACAAATTTC 
AATGAAGTGT TACATTGTGG GAGAATATTC 
ATATATATAA AGCTT 



PS202-- 

1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
" 901 
951 
1001 
1051 



AAGCTTGCAC 
TTGGATCCCC 
CGCTTCCTGA 
TATTATTAAT 
AGC TTATTTA 
ACGTGCTCTT 
ACATCTTGGC 
ATCATCTCCT 
ATTCGTGAAA 
CATATTGTTT 
CACATCGGGT 
TCAAAGGTTT 
CCTTGGTGCT 
AGGCGGGCCA 
TGTTGCTCTA 
TTGAATATGA 
ACATCTCATA 
TGAAGAATCT 
TTGCATTTAC 
MTTGTGGTA 
AGATGATTTA 
TCTACGATAA 



GCCTACATCG 

CTATCATTTG 

AAAATGTTAT 

ATAACGATGC 

TCTTATATTG 

TTAATTTGGG 

TATTCTGTTT 

AATTTAGACA 

CAGGTAAACA 

ATACATTACT 

ACATCTAACC 

GCACAAACTT 

CTTTTTTTTC 

GAAAGGTGTG 

AAAAGGGTCC 

ACCTCCAGGA 

AAATGCCAAC 

GAAAATTTTC 

TTCACTTGGT 

TCTACACATT 

GTTCCTTCCA 

TGATAATGAA 



TGGGATAATT 

TGAAACAGGT 

ATATTGTTGT 

TTATTTTGCT 

TATCTTATTA 

ATCTATTAAG 

ACCAGCTGCT 

AAGGAAAGGG 

TACATTCAGA 

GTAAATTGTG 

TGCGTCATGT 

AATGTTACAA 

CTAATGATAC 

CCTGGTCACT 

CACACTGTCA 

TTTTGCTGTA 

TGAATTATCG 

GAACTTATAT 

GTCAAGTATG 

TAGAGTCCAG 

ATGAAAAACC 

CTAGCCAAAT 



TAGAAAAAGG 

AACCATACGA 

ACTCATATTT 

TGGAGA TIGG 

AACTTAAAAA 

GGTTCGTTGC 

ACCTTAGCCT 

TATATTGGAC 

TTATACTCTT 

ACTATTTGTA 

TATCTTGAAC 

TCATGTCCAC 

TTCTTATATA 

AAAGAGCAAC 

ATTCTGTCAT 

ACAGTGGTTC 

GAGTTATACT 

TAGAACATAC 

ATA A. A G AG C T. 

GGACAGATGT 

TAGGAATTTA 

CAAGCTT 



AAAGGGTATA 

GAACCCCTTT 

ATACACTATT 

A^ATTATCAC 

CATAAATACT 

ACGCTTTTAA 

GTATGCTTAC 

CCCCCCTATC 

TTCAGAATGA 

TATTAGGGTC 

ACTGTTCCAA 

CATACGTATG 

TTCAGCTCAT 

GAAGTGAGTA 

CCAAAGAAGT 

AATAAGGTTG 

TTGGAAATAC 

AATAACATGT 

fi r T rr,AGAAGA 

ATCATTTTAT 

TAGCTGTACT 
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•2S205-2 



Tint ss» sag ilS llil 

Taa.TTA.TAT A TAAAAATTAT ACATAATATA AiGAiGGTAT TTA.nTATAC£ 
TAATTAIAlH TTTrTT GAATCAAAAT AGAGTTGTGT 

ATAAATTTGA ACGAI^| ^ g g^ g TATGAAGAGA TGAATTTGTG 
TTTTTTTTAT GGAGGAGGaI GGT^CTCAGT GATGGAATCA TCCCTGGTTT 

tcSagcac caatgaaagt aatgaacccc ccccaaaaaa aaaaaaaaaa 
ISII^gg gagagagagt agaatggaac ggctaggtga aagtatagga 
gtagaaatta ggttcaggga gagaaaaggg ^aaattaa ttcciaaatt 
aatgggattc taatttttaa actgttttga aaattccatg 
atttatatta ttaactttta aaaaaagtca ^cgaggtaa ^attccaT g 
ggggaaaatt taaatggtta gtcttctata atattt ctaaattaat 
cactaaaaat tagtctaaaa ataaccctaa a ^t tacacaggaa 

TAGTTCATCG AACAGGAGCA TTGGATTATC CCI^a^o. 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 GCTT 



pS206-l \ 

LSt : TaTCCAGCT ATTATTATAG CATGTGAGTT GTCCGTGJAC AGCTAATTTT 
51 TTACCACACC CAAATTCAAT ACTATTTTAG JGTAAATATA *CTTT 
CTAGTCTTAA TATTTAACTT TTTGTCTTAC TTiTAATAGA TTT 
GAAAAATTAA TAATTACAAA AAATAAAAAG JATATATTCA £ 

sees? sssss esssl ca^g 



101 
151 
201 
251 
301 AAGCTC 



pS211-l n 

^ GA^TTCCGTG GTTTTAGCAC GGTCGCTCA, TGCAAATTTA. SSSSS 
ATCTGATTTT TAAACAATTA AGAACTTATA TGCAAATTTA ^TTT^ 
CCGCTTTTAT CATTATTTAT TTTATACAAA ^TACAACGT 
CATCTCGAAC CACGCCACAA CCAGTGCACA CGTGA^TTGT RT 
TGGACTTCGT CAAGATCGTG ATTTGGGTTA CATAAATGIA 
TTAAGAAAAT AACCTTATTA AATATTGCGC CAAAATACTA CGCGT 
TACTATTAGG GTAGGCTTGT GAATTTTACT J^TCGCCCA * 
TAGGTATTTT CTTATATTAA AAAAAATAAG ATGGGGGCCT £ T 
ATTATTTAAT ATTTATTTAT TTTTTAGCGA AGATCCCTCC TART 
GAATACCCTT TAATGACTAC ATCTTTATTA TTJCTAAG T T 
TATGAAGTCA ATCTCTACAT ACATAAAAAT JJJAT^ A ATTAC AAACA 
AAAAC AAAT A TTAATGGAAA GTAATATTAC JAAAATTATA TG 
faUi ACATGGAATT GTCACAAAAT AAA AAAT AAA ^TAATTAT 
651 GATTAAAATT CATATTGTTA GTATGACTTA AGCTT 



(S 

1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
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cS217-l \ 

1 AAGCTTTAAA AGGAAGAGAG 

51 TAGCCACTAA GATATACAGT 

101 AAATTTTGCC TTTTTCTGTT 

151 TGTTCTATAT ATTTACGGGA 

201 TTTATTTTCT GGTGAAAAAT 

251 AGATGCTACA ACCTTGACAA 

301 TGTTCTTTGA GTACTGGTTT 

351 AGCCAAAGAG TCTCGTCGTC 

401 AAGATTTAGC ATCCAGAGAA 

451 GATCATATTG GTACTAACCA 

501 ATCTCTGATA CATTTGCTTC 

551 ATTCTTGTTT TAAATTTTTC 

601 AATGATTTTA GGCACAAGTA 

651 ACCCTTAAAC CCAACATGGT 

701 TTCATTATTA AAAG_£X££X& 

751 CTAGCTGTAG AGGTTTCAAG 

801 ATCCCCTAAA AAAAAAAAAA 

851 ATCCTAAGCA AGTTAGGGTT 



CCACAATTTT CTTTGACCTT CCTTCTCTCC 
ACTGGTCAAA AAG AG C AT AT TTATAGCTCA 
GTAAACGT£i TTCTrTTCTTA. CTTGGATTCT 
GAAAAGAGCA ATTTGCATGC TCCTAAATCT 
TGGTCTTTAA TTGGCTGGGA ATTATTTTTT 
ACACCTAAGA ATATTTTAGT GACAATGGCT 
TTCTGTTTCT GGTCCCTGTT TCAACGCCAC 
ATTGCCCTTC GATTGGCACT CTGCAACTTA 
TTTCTAGGCA AACCCTTGGA TTATGCATCA 
TTGGAATGTT GAACGACTTT CTGTATGTAA 
TGTGTTTATA CTTGGTGTTT TCATGTTTTC 
GAGATCAAAT CATTTATAAG TATTTATTCT 
TCAATCGCTG CTCAGAGATG GTGGGAGAAG 
AGAGATCAAT TCAGCAACAC AACTTGTTGA 
AT Ar,ATTGGT CATAATTGAC TTCTTCTCTC 
ACTTTACATC CTAAGGTAAG ATATATAGCA 
AAAAAAAAAA AAACCAACAA CTACATCGTA 
AACTATATGA ATCATCACTA GACGGATCC 



i AnrrTTAACT TTACTCACA T TGCTTTCTTT AGGGAAGCGT CTTCTTAAAT 
??ccSotc t^tttctc ATGAATCTTC TTCTGTTGTC CACTCTGTTA 
GACCATCCTC T AAA 1 11^ TftrTr , ACT AT TATCCAATCA 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 

951 



rTrT a ATTCATATT TAGATTATCT TG T TCACCAG CCC&I&CISA 
"Set AACTTTTCCT TCCGGTAGTC ™AGTC 
ATGAACTTAT TTCTTGAAAT GAGGATATGA CTTTATGGCC TATACTCTT1 

?ggtgtScI aggggtgtga ggtgtgatct tttggttgaa ttgagtatag 

ACTCTGTAAT ACTGTCATCT TTGGGATCTA CCGTTGTCCT CCATGTATCA 

tSS?I"c aSatggttg attaactatt ttcttatttc cggctaacat 

TTATGTCTAT CACTTTATTC TGAAAACTCG «CAAGACAT TCTTTTCGTT 
TTAGATCCCC TTTGCTCCAT CCAGTGGTTC TTCGGGGGAC TTAACGT1U1 
CG?TC?CCTA GGGAGGGGAG CCACACTAAG ^AATATTTA TGCGTTGTAG 
_ rrTr rcTATCTTCT GAGATATTTT TTTCATGCTA ATATTCACAl 
CTAATTGTAA SSaGAG TGCGCCATCT GGGTGCCTCA CAAGAAGAGC 

SwSStc Sgtaatat ccttcggaaa tgtcaagtaa cacaacacaa 

TCCATTCACC ATTTTGGGTT ACTCTW\CCT CAGiCGGATA CT^Ai^i 
rTCATTTTAT TAAACTACAC ATGTTAG££C_ A1MCTAAGA 

tgggtgtggc caattctaca tacatctgtt actgttgaaa gtaagtcgca 
atgcttttat ttttctgccg gagttgaaaa taccgataat ctatattaac 
tgggtacctc gtacccttct catctttctc cttttacttg ttgaagcti 
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□5220-1 x . \ 
1 AAGCTTG 



1 


AAGCT i G AAA 


51 


AAAATTATTT 


101 


fjAAATGTCGT 


151 


AAATAGTCAT 


201 


TAACTTTGGC 


251 


TAATATTTAG 


301 


AACTCTAATA 


351 


GTAAAAGTAC 


401 


CTCTTTGCTC 


451 


AGAAGTAAAC 


501 


TCTTACTCAG 


551 


ACATTGAGTT 


601 


TCGGTTAGTA 


651 


TTTTATTTTT 


701 


TCATAAATGT 


751 


. ATTTCATATT 


801 


CCGATTAGAA 


851 


ACCAAGTTAA 


901 


TTATGGCTTT 


951 


CTAGAACCCA 


1001 


ATATAAAACT 


1051 


CTTAATCGAT 


1101 


AAGTTGGGAA 


1151 


ATAGTAATTG 


1201 


ATTATGGTGT 


1251 


ATGTTGTAAT 


1301 


GTTGTCATTA 


1351 


ATATAGATAA 


1401 


AGTCTACAAT 


1451 


ATGCATATTT 



AAGAAGAATT AAGGCT7GC7 
TGAACTATCT ATACTATATT 
T£GCCTTTTT TACCCTTTAA 
TTTACTATTT TTCCTAATAT 
TATTAAACAT TTTCTTATAA 
AAATTTAATT AACATAACCA 
TGGTATCCAA ATCAGTCXMi 
TTCTAATAAA TTCATATACT 
TTCTTTTTAT GTATCCTTTC 
TTTTAGGGTT GGCCCCCCCT 
TTGTTGGAAT ATAATTCAAA 
TTACTTTGTG GAAGAGAATT 
ATTGATGATG CATTATTTTT 
GCATTTTGGG ATCAAATTGT 
TTGGGATATT GTTGGTTATT 
TATTGTTAAA ATCCTTTATT 
GTTGATAGTC GCTTTTGTT7 
GGAGTTTTAG AAGCACTTTG 
ATCAAATATA GGTTTTGAAG 
AGAACTAGGA ftGTTAgftSTft 
TTTTATTGTA ACTCAAATCG 
AAATTATTTT TTTATATTGA 
TAATTAAAAT ATCATATTTT 
GTAAAAAATC ACTTTAAATT 
CTGGCATAGT TGTTTGGAAG 
TTTTATTTTG TTATAGGCAT 
TATTTTATTA TTTGGAAGTG 
TCAATTTTAT AAGAAATTTG 
ATGCGAGTAA AATTTGATTG 
TATTTCAATG TGTTTATTAT 



TTCTTAATTT TTAAAAAATA 
AAAAGCACGA AAACCCTA.TC 
AAATAATTTT ACATTAGACA 
ATAGGATTTT AAAATTAATT 
CTTGAAATAT GTAAAACTCC 
AGGATTTTTA TATCGGTAAT 
aarTCTCTTA ££I£TAATAA 
TTTTCTCTCT TCTCCGATCT 
CTTTCTAATA GCCTTTTATG 
CCCCCCACAA TTATATAGTT 
TTCTTAAATA ATTGACGGTG 
AGATTCTCGT GTTAGTAAAA 
ACTCTATAAT AGAGATGCAA 
AATGCAGTCA TATATTGATT 
TAACTAGAAA TAGACTTCTT 
GGAGATGAAT TATTTGTTCA 
TAGAAGAAAT TTTACCGTAG 
CATGGGAGCA TTAGTGTATG 
flTT TAGAGAG CCAAgAAAAfi 
aTT^AHAATA CCATAACGTG 
GTAATATTTT TTGCTTTAGT 
TTAGTTATAG GAGGCTCACA 
GTATTTGAAC AATTTATGAA 
TTTATCCTAT ATCCAGAAGG 
ATTTGAATCA GGGTAAAAGT 
TTTTTGTGCT TGATTGTTTT 
TATATATATG TTTGATTAAA 
CAACAATTAC ACAAGGATAA 
AACCTAGGAT GTCATATTTA 
ACATCTATTG TATTATATG 
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CLONE %AT BINDING GRAPHIC 

STRENGTH REPRESENTATION 



pS116-l 76.6 
(SEQIDNO: 5) 



90 



T T 



UUU LUUUUROU 
ATA TA T AT T TA 

warn wttmm m 

A TATTTATAATA 
U OU UUUUUU 



RB7-6 73.2 
(SEQIDNO: 20) 



80 



UORR 
AT 



R U 
AA AA 



V//WsWmKW///M'AZW 



AAA A 

u o 



T TA 
U 



IE 



uu 

AT 



T TA 
U 



pS211-l 71.7 
(SEQIDNO: 10) 



OR O 
T TAT nTAA 

70 i ta a m mm 



T TAAATTT 
O O R 



pS220-l 73.1 
(SEQIDNO: 13) 



AATAAAA 

70 m B 



u 



R 



AT 
R 



AAAAA 
O 



URO RU 
TAT AT TT T 



w///MW/,\mw,i 



AAT 



AA A 
R 



pS226-l 77.1 
(SEQIDNO: 9) 



60 



UUO U 
AAA 



TATA 
U RU 



pS205-2 71.4 
(SEQIDNO: 8) 



50 



AAA 



R R 
T AA T 



A TA ATT A 
OOUR UR 



Ps217-1 65^ 
(SEQIDNO: 11) 



40 



OU 
TT 



AAAA 



RR 

T AA A 



AA A TT 



FIG. 4. 
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CLONE %A+T BINDING GRAPHIC 

STRENGTH REPRESENTATION 



A A TA AAA 
pS202-l 68.3 40 I B B a 



(SEQ ID NO: 6) AT 



O O 
T A 



pS202-2 64.9 40 I MM 



(SEQ ID NO: 7) AA1 ^- 



AATAT AA 
R 



U RR U U 

nTTT 

pS115 65.7 40 I W H M » 
(SEQ ID NO: 4) A^ A AA ^ 

ORR 
A AA TTT 
pS4 72.2 20 IH H^j I 
(SEQ ID NO: 2) A ^ TA 



UUOU 
A AA AT 

psi 7i.8 20 i mmv/Mi 

(SEQ ID NO: 1) TA TJ AAA 



pS8 61.6 0 M H I 

(SEQ ID NO: 3) A M 

R OR O O 

T T T T T 

pS218 62.0 0 I — -J 
(SEQ ID NO: 12) ^ o 

FIG. 4A. 
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ARS 




Topo Sites 



50 

Binding Strength 

FIG. 11. 




Binding Strength 

FIG. 12. 
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SEQUENCE LISTING 



<210> 1 
<211> 437 
<212> DNA 

<213> Nicotiana tabacum 



<400> 1 

10 gatacgtaaa caacgtgtat ccagtaagta tcaagcctaa tctcgaagtg gtagagacga 60 

gatgaccgac tttgacactc actatgggtc aataataata actgaaataa aactaagata 120 

tttaaaccaa catgatttac agaatttaca ataatttatt taatcagcag aaataatcaa 180 

atttcttcaa atgtaacaat tctcaatata ttaattaaat tccttcaatt caaataattt 240 

ctaatttatc aattaaacct catttacagg agtaacaatt aattccttaa caagcaagaa 300 

15 taataattca ttaaattcca aggatttttc aatttattaa ttagcttcac aacctgaaat 360 

aaattattaa agtatcgtgt aattattatt attaagcacg atttctgccg aggacatacg 420 
gcccgatcca gagtatc 437 



<210> 2 
20 <211> 587 
<212> DNA 

<213> Nicotiana tabacum 



<400> 2 

25 gatactagag tggtgttatc aattcttact cgtatgaatt aattaaattt gtctcttatt 60 
tctgtcctaa gtcatataca agaaatgcta actccatccg tttcaatccc tatgacatag 120 
tttgatttga ttgaatttga aaatttaaga aacaaaagat aatttttgtg actcataatt 180 
tagacatgtg ttataagact tttctcatga attttttaga aacaaatgat aatttttgga 240 
actcataatt tagacgtttt ataaaaaata ctaactgcat ctggttcaat atttatgtgt 300 

30 tattcctata aaacttctgg acttatattt ttaaatattt cataatattt ggtatcggta 360 
taattttttt gtcacttttg gatgaaaggg aagtttaagt aaatttcttt ttccaaattt 420 
agaaagttat aatattcttt ttaaaacgcc caaaaagaaa aataagctat tgattattat 480 
aagcctaaac caaaagaatt ctttgactag taggaagcca tttttaagtt aggcgccaaa 540 
attcaaagcc aacgtgggca tatctccaaa ctggcggcta cagtatc 587 



35 



40 



<210> 3 
<211> 383 
<212> DNA 

<213> Nicotiana tabacum 



<400> 3 

accgctttta ttattattat ttttaccgag aattacaaca tcatgaaaat acatctcgaa 60 
ccacgtcaca tcaatgcacc cgcggttatt gacatatttc aactctgttg agatttggat 120 
ttgggtcaca taaatgtgca cccgagttta agaggataac attattaaat acgcgcctaa 180 
45 aacgactagc gtatcattat tttgggtagg gccgtgaaat tttgctaaac tgcccatcca 240 
gaaatctaag taattttacc aacacgtata gagggcccca cagcttgtgt atttttgttt 300 
gtcgaggctc gtctcattca ttatttttaa aaggaatttg caacgtcgtg gaaatgcatc 360 
tcgaaccacg tcacaatcaa tga 383 

50 <210> 4 
<211> 866 
<212> DNA 

<213> Nicotiana tabacum 



55 <400> 4 
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gaattcgata gactcactta aatattagaa gtgaattacc tagagttaga tccaaaacaa 60 
ttatcttgca cctatcctat caacccttat cttttcccat tgattactac cttgcttacc 120 
tttgttacga ttttcattag acaataactt tagattctta gttaattgca gttagaaatt 180 
atattaaatt tcaattgttg gatcatcttg aataccaatc aagctagaaa atacaagaat 240 
5 actgtttaaa tcaaatccat gtggatacga tattatacta tattatattt gacttgtgag 300 
cattatttat gtgtgttttg tgctcgtcaa agtttggcgt cgttgccgag gattggcaat 360 
caatagtgtt tgaaatagtt tttggtgcta atttaggaat taggttttat ttatttattt 420 
tttcttttct tttctttttc cttttctatt ttatttcctt tattagttaa cttcttttca 480 
agattttttt tgtagtacct aacaagttag agaagatact gtagattttg aactctaaat 540 

10 gttgtgaaga tggagtacaa ccagcctaag aaaatatttg aatagttagc agctgaacat 600 
tatcggcggt cggttatgcg gtttaaatgc ggtggaagca tctaccaccg cagcctaaag 660 
aaaatatttt gaatagttag cagcttgaac attatcggcg gtcggttatg tgttttaaat 720 
gcggtggaaa tcatctacgg gctaactgtc aagcaggtat gtattcttcc tatggttcgt 780 
attttgagga gtctcactct gtttctagtt cgtacatgta tgaggattca tatgggcaca 840 

15 actctgactc tggttgggat gaattc 866 

<210> 5 
<211> 998 
<212> DMA 
20 <213> Nicotiana tabacum 

<400> 5 

gaattgtatt attgttaggt gggagagatt tttgactata tgggttaaaa tcagcgacaa 60 
agggccaaat atacctattt acttttaaaa atagtctaat aatacctctc gttatattat 120 

25 taggttatct atacctttgc agtcatattt tgggttcaaa tatacccctc atttaaacgg 180 
agggacacgt gtcatcgtcc tgttggtcaa ttctaaatat ctcctaatta attaaaaaga 240 
ctcattaccc atatccgaaa aatatttttt aaagcaatat ttttttataa aaaatggaaa 300 
aactgaaatt atttttacta aaaattgaaa aaaacgaaaa tagttttttt tcagttttta 360 
caaaaaaact attttagaaa aaattgaaaa atattttcta aaacaatgtt tttgtaaaaa 420 

30 ctgaaaaaaa agaagctgaa aatcaatttt ctaaagcaat tttatttgta aaatctggaa 480 
aaaactacta aaaactgaaa aaatgaaaat attttttttt ctaattttta caaaaaaaac 540 
tgctttaaaa aaagctgaaa atattttcta aaacaatatt tttgtaaaaa ctaaaaaaaa 600 
aatattttct tctttttttc agtttttagt taaaaatatt taagtttttt ccagttttta 660 
attactttag aaaattactt ttctgctttt ttttcagttt ttacaaaaat attattttag 720 

35 aaaatatttt tcagttcttt aaagcagttt ttttttgtaa aaacttgaaa aacaatattt 780 
tcgttttttt cagtttttag taaaatttgt ttttagtttt tttcagtttt taccaaaaat 840 
aaaattgctt tagaaaatta tttttcgggt atgggtaatg ggtcttttta attaattagg 900 
agatattttg aattgatcaa taggacgatg acacatgtcc ctccgtttaa atgaggtgta 960 
tatttgaatc caaagtaaga ctgcagcccg ggggatcc 998 

40 

<210> 6 
<211> 635 
<212> DNA 

<213> Nicotiana tabacum 

45 

<400> 6 

gaattcgata tggcttgttg gacaagaatt aatgaatcaa ttgtgaaaaa gttgatggac 60 
atattgaagg taaaatcata tactattttt ctaaaatctc ttttaaatgt tccccaatta 120 
tctgatttct atattgctct taaatgtcac tcaaccttag atcaacaaac atataactta 180 

50 cccagtacat aagagattgc ggcattatgg cttgaagaaa atcctagaga cacatctgca 240 
ccacatattt gaatttatac ccacagtaat agagctcggt tagtacatta ttattatgga 300 
tgttacgatc cgttgcagta tccattatta ttttccttcg gtgaaaatgg atgacattgt 360 
ggaattaaaa aaattattca gacaaaaaat tcgacgaaac gtagagctta ctgcgaacat 420 
gaacaattgc ccagtatatc aaatacgtgt tcagttgatg gattccttga tatggaagat 480 

55 gaatcactac aaagaggaaa acgaaaaaga gatacagtgt cttgtcgaga gtattattgt 540 
tacaaatttc aactaagaaa taatgaaaca aatgaagtgt tacattgtgg gagaatattc 600 
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caacaattta tagtagatat atatatataa agctt 635 

<210> 7 
<211> 1087 
5 <212> DNA 

<213> Nicotiana tabacum 

<400> 7 

aagcttgcac gcctacatcg tgggataatt tagaaaaagg aaagggtata ttggatcccc 60 

10 ctatcatttg tgaaacaggt aaccatacga gaaccccttt cgcttcctga aaaatgttat 120 

atattgttgt actcatattt atacactatt tattattaat ataacgatgc ttattttgct 180 

tggagattgg agattatcac agcttattta tcttatattg tatcttatta aacttaaaaa 240 

cataaatact acgtgctctt ttaatttggg atctattaag ggttcgttgc acgcttttaa 300 

acatcttggc tattctgttt accagctgct accttagcct gtatgcttac atcatctcct 360 

15 aatttagaca aaggaaaggg tatattggac cccccctatc attcgtgaaa caggtaaaca 420 

tacattcaga ttatactctt ttcagaatga catattgttt atacattact gtaaattgtg 480 

actatttgta tattagggtc cacatcgggt acatctaacc tgcgtcatgt tatcttgaac 540 

actgttccaa tcaaaggttt gcacaaactt aatgttacaa tcatgtccac catacgtatg 600 

ccttggtgct cttttttttc ctaatgatac ttcttatata ttcagctcat aggcgggcca 660 

20 gaaaggtgtg cctggtcact aaagagcaac gaagtgagta tgttgctcta aaaagggtcc 720 

cacactgtca attctgtcat ccaaagaagt ttgaatatga acctccagga ttttgctgta 780 

acagtggttc aataaggttg acatctcata aaatgccaac tgaattatcg gagttatact 840 

ttggaaatac tgaagaatct gaaaattttc gaacttatat tagaacatac aataacatgt 900 

ttgcatttac ttcacttggt gtcaagtatg ataaagagct agcgagaaga aattgtggta 960 

25 tctacacatt tagagtccag ggacagatgt atcattttat agatgattta gttccttcca 1020 

atgaaaaacc taggaattta tagctgtact tctacgataa tgataatgaa ctagccaaat 1080 

caagctt 1087 

<210> 8 
30 <211> 704 
<212> DNA 

<213> Nicotiana tabacum 
<400> 8 

35 gaattcttca gccattgtac atatagttgt gtattaatgt tattaataat ggataattaa 60 

atatatacct ggaataaata tacgatatta taatagtgtg taattatata taaaaattat 120 

acataatata atgatggtat ttaatatagc ataaatttga acgatctgga ttgatttctt 180 

gaatcaaaat agagttgtgt gaaaagaaaa gaatgagatg aaaagcaaag tatgaagaga 240 

tgaatttgtg ttttttttat ggaggaggaa ggttctcagt gatggaatca tccctggttt 300 

40 tctttagcac caatgaaagt aatgaacccc ccccaaaaaa aaaaaaaaaa aaaaaaaagg 360 

gagagagagt agaatggaac ggctaggtga aagtatagga gtagaaatta ggttcaggga 420 

gagaaaaggg gggaaattaa ttcctaaatt aatgggattc taatttttaa actgttttga 480 

aatattttaa aagtagtgtt atttatatta ttaactttta aaaaaagtca aacgaggtaa 540 

aaattccatg ggggaaaatt taaatggtta gtcttctata atattttcaa ctctgcttag 600 

45 cactaaaaat tagtctaaaa ataaccctaa attagtgtat ctaaattaat tagttcatcg 660 
aacaggagca ttggattatc cctccagagt tacacaggaa gctt 704 

<210> 9 
<211> 306 
50 <212> DNA 

<213> Nicotiana tabacum 

<400> 9 

ggatccagct attattatag catgtgagtt gtccgtgaac agctaatttt ttaccacacc 60 

55 caaattcaat actattttag tgtaaatata tcttttaggt ctagtcttaa tatttaactt 120 

tttgtcttac ttttaataga ttttatttga gaaaaattaa taattacaaa aaataaaaag 180 
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tatatattca catacttata gtacaaactt tgtttctatt tataaagaga aaaagaaatt 240 
ttacaaaaaa caaatatatt tgctttcttt taattagtag ttttattaag caagctatag 300 
aagctc 306 



5 <210> 10 
<211> 685 
<212> DNA 

<213> Nicotiana tabacum 



10 <400> 10 

gaattccgtg gttttagcac ggtcgctcaa 
taaacaatta agaacttata tgcaaattta 
tttatacaaa attacaacgt cgtgaaaagg 
cgtgatttgt tgacgcattt tggacttcgt 

15 caccccgtat ttaagaaaat aaccttatta 
tactattagg gtaggcttgt gaattttact 
cttatattaa aaaaaataag atgggggcct 
tttttagcga agatccctcc cttattttat 
ttactaagtt tgtctataat tatgaagtca 

20 ttactaattt aaaacaaata ttaatggaaa 
acatggaatt gtcacaaaat aaaaaataaa 
catattgtta gtatgactta agctt 



ttgtcatatt tggctcattt atctgatttt 60 
acttttaaaa ccgcttttat cattatttat 120 
catctcgaac cacgccacaa ccagtgcaca 180 
caagatcgtg atttgggtta cataaatgta 240 
aatattgcgc caaaatacta cgcgttatga 300 
aaatcgccca tctcggaatc taggtatttt 360 
gcaatttttt attatttaat atttatttat 420 
gaataccctt taatgactac atctttatta 480 
atctctacat acataaaaat aacatattaa 540 
gtaatattac taaaattata attacaaaca 600 
aactaattat cccatagttg gattaaaatt 660 

685 



<210> 11 
25 <211> 899 
<212> DNA 

<213> Nicotiana tabacum 



<400> 11 

30 aagctttaaa aggaagagag ccacaatttt ctttgacctt ccttctctcc tagccactaa 60 

gatatacagt actggtcaaa aagagcatat ttatagctca aaattttgcc tttttctgtt 120 

gtaaacgtga ttgtttctta cttggattct tgttctatat atttacggga gaaaagagca 180 

atttgcatgc tcctaaatct tttattttct ggtgaaaaat tggtctttaa ttggctggga 240 

attatttttt agatgctaca accttgacaa acacctaaga atattttagt gacaatggct 300 

35 tgttctttga gtactggttt ttctgtttct ggtccctgtt tcaacgccac agccaaagag 360 

tctcgtcgtc attgcccttc gattggcact ctgcaactta aagatttagc atccagagaa 420 

tttctaggca aacccttgga ttatgcatca gatcatattg gtactaacca ttggaatgtt 480 

gaacgacttt ctgtatgtaa atctctgata catttgcttc tgtgtttata cttggtgttt 540 

tcatgttttc attcttgttt taaatttttc gagatcaaat catttataag tatttattct 600 

40 aatgatttta ggcacaagta tcaatcgctg ctcagagatg gtgggagaag acccttaaac 660 

ccaacatggt agagatcaat tcagcaacac aacttgttga ttcattatta aaagctggta 720 

atagattggt cataattgac ttcttctctc ctagctgtag aggtttcaag actttacatc 780 

ctaaggtaag atatatagca atcccctaaa aaaaaaaaaa aaaaaaaaaa aaaccaacaa 840 

ctacatcgta atcctaagca agttagggtt aactatatga atcatcacta gacggatcc 899 

45 

<210> 12 
<211> 999 
<212> DNA 

<213> Nicotiana tabacum 

50 

<400> 12 

aagcttaact ttactcacat tgctttcttt agggaagcgt cttcttaaat gaccatcctc 60 

taaatttctc atgaatcttc ttctgttgtc cactctgtta tcgctgaaac gaaatctgaa 120 

attgtcatga tgctgactat tatccaatca ctcagtctct aattcatatt tagattatct 180 

55 tgttcaccag cccatactga tttttattgt tttggggtct aacttttcct tccggtagtc 240 

ggttggagtc atgaacttat ttcttgaaat gaggatatga ctttatggcc tatactcttt 300 
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tggtgtctca aggcctgtca cctctcatct tttccttcaa ttgactatag actctgtaat 360 
actgtcatct ttgggatcta ccgttgtcct ccatgtatca tatcttactc ataatgcttc 420 
attaactatt ttcttatttc ccgctaacat ttatgtctat cactttattc tgaaaactcg 480 
aacaagacat tcttttcgtt ttagatcccc tttgctccat ccagtggttc ttcgggggac 540 

5 ttaacgttct cgctctccta gggaggcgag ccacactaag gtaatattta tcccttctag 600 
gctttccgtg cctatcttct gagatatttt tttcatgcta atattcacat ctaattgtaa 660 
ttttctagag tgcgccatct gggtgcctca caagaagagc tattagcatc tttgtaatat 720 
ccttcggaaa tgtcaactaa cacaacacaa tccattcacc attttgggtt actctaacct 780 
cagtcggata ctaatatcct gtcattttat taaactacac atgttagccc ccaataggat 840 

10 ataactaaga tgggtgtggc caattctaca tacatctgtt actgttgaaa gtaagtcgca 900 
atgcttttat ttttctgccg gagttgaaaa taccgataat ctatattaac tgggtacctc 960 
gtacccttct catctttctc cttttacttg ttgaagctt 999 

<210> 13 
15 <211> 1499 
<212> DNA 

<213> Nicotiana tabacum 
<400> 13 

20 aagcttgaaa aagaagaatt aaggcttgct ttcttaattt ttaaaaaata aaaattattt 60 
tgaactatct atactatatt aaaagcacga aaaccctatc gaaatgtcgt tcgccttttt 120 
taccctttaa aaataatttt acattagaca aaatagtcat tttactattt ttcctaatat 180 
ataggatttt aaaattaatt taactttggc tattaaacat tttcttataa cttgaaatat 240 
gtaaaactcc taatatttag aaatttaatt aacataacca aggattttta tatcggtaat 300 

25 aactctaata tggtatccaa atcagtctag aactctctta cctctaataa gtaaaagtac 360 
ttctaataaa ttcatatact ttttctctct tctccgatct ctctttgctc ttctttttat 420 
gtatcctttc ctttctaata gccttttatg agaagtaaac ttttagggtt ggccccccct 480 
ccccccacaa ttatatagtt tcttactcag ttgttggaat ataattcaaa ttcttaaata 540 
attgacggtg acattgagtt ttactttgtg gaagagaatt agattctcgt gttagtaaaa 600 

30 tcggttagta attgatgatg cattattttt actctataat agagatgcaa ttttattttt 660 
gcattttggg atcaaattgt aatgcagtca tatattgatt tcataaatgt ttgggatatt 720 
gttggttatt taactagaaa tagacttctt atttcatatt tattgttaaa atcctttatt 780 
ggagatgaat tatttgttca ccgattagaa gttgatagtc gcttttgttt tagaagaaat 840 
tttaccgtag accaagttaa ggagttttag aagcactttg catgggagca ttagtgtatg 900 

35 ttatggcttt atcaaatata ggttttgaag attcagagag ccaagaaaag ctagaaccca 960 
agaactagga agttagagta attcacaata ccataacgtg atataaaact ttttattgta 1020 
actcaaatcg gtaatatttt ttgctttagt cttaatcgat aaattatttt tttatattga 1080 
ttagttatag gaggctcaca aagttgggaa taattaaaat atcatatttt gtatttgaac 1140 
aatttatgaa atagtaattg gtaaaaaatc actttaaatt tttatcctat atccagaagg 1200 

40 attatggtgt ctggcatagt tgtttggaag atttgaatca gggtaaaagt atgttgtaat 1260 
ttttattttg ttataggcat tttttgtgct tgattgtttt gttgtcatta tattttatta 1320 
tttggaagtg tatatatatg tttgattaaa atatagataa tcaattttat aagaaatttg 1380 
caacaattac acaaggataa agtctacaat atgcgagtaa aatttgattg aacctaggat 1440 
gtcatattta atgcatattt tatttcaatg tgtttattat acatctattg tattatatg 1499 

45 

<210> 14 
<211> 10 
<212> DNA 

<213> Nicotiana tabacum 

50 

<400> 14 

aataaayaaa 10 

<210> 15 
55 <211> 10 
<212> DNA 
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<213> Nicotiana tabacum 



<400> 15 
ttwtv/ttwtt 



10 



5 



<210> 16 
<211> 11 
<212> DNA 



<213> Nicotiana tabacum 



10 



<400> 16 
wtttatrttt w 
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<210> 17 
15 <211> 15 
<212> DNA 

<213> Nicotiana tabacum 
<400> 17 

20 gtnwayattn atnnr 15 

<210> 18 
<211> 18 
<212> DNA 
25 <213> Nicotiana tabacum 

<400> 18 

tgtaaaacga cggccagt 18 

30 <210> 19 
<211> 18 
<212> DNA 

<213> Nicotiana tabacum 
35 <400> 19 

caggaaaccg atatgacc 18 

<210> 20 
<211> 1103 
40 <212> DNA 

<213> Nicotiana tabacum 

<400> 20 

tcgattaaaa atcccaatta tatttgcaga ttaaatcaaa ccataactca ttttgtttaa 60 

45 gcttggtttg gttttatatt tatataagtt tttatatata tgcctttaag actttttata 120 

gaattttctt taaaaaatat ctagaaatat ttgcgactct tctggcatgt aatatttcgt 180 

taaatatgaa gtgctccatt tttattaact ttaaataatt ggttgtacga tcactttctt 240 

atcaagtgtt actaaaatgc gtcaatctct ttgttcttcc atattcatat gtcaaaatct 300 

atcaaaattc ttatatatct ttttcgaatt tgaagtgaaa tttcgataat ttaaaattaa 360 

50 atagaacata tcattattta ggtatcatat tgatttttat acttaattac taaatttggt 420 

taactttgaa agtgtacatc aacgaaaaat tagtcaaacg actaaaataa ataaatatca 480 

tgtgttatta agaaaattct cctataagaa tattttaata gatcatatgt ttgtaaaaaa 540 

aattaatttt tactaacaca tatatttact tatcaaaaat ttggcaaaac cgaaccaatc 600 

caaccgatat agttggtttg gtttgatttt gatataaacc gaaccaactc ggtccatttg 660 

55 cacccctaat cataatagct ttaatatttc aagatattat taagttaacg ttgtcaatat 720 

cctggaaatt ttgcaaaatg aatcaagcct atatggctgt aatatgaatt taaaagcagc 780 
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tcgatgtggt ggtaatatgt aatttacttg attctaaaaa aatatcccaa gtattaataa 840 

tttctgctag gaagaaggtt agctacgatt tacagcaaag ccagaataca aagaaccata 900 

aagtgattga agctcgaaat atacgaagga acaaatattt ttaaaaaaat acgcaatgac 960 

ttggaacaaa agaaagtgat atattttttg ttcttaaaca agcatcccct ctaaagaatg 1020 

5 gcagttttcc tttgcatgta actattatgc tcccttcgtt acaaaaattt tggactacta 1080 
ttgggaactt cttctgaaaa tag 1103 



<210> 21 
<211> 838 
10 <212> DNA 

<213> Nicotiana tabacum 





<400> 21 








aagcttacat 


tttatgttag 


ctggtggact 


15 


attaatggcg 


ttattggtgt 


tgatgtaagc 




tctaacaaaa 


tagcaaattt 


cgtcaaaaat 




ttatttaagt 


attgtttgtg 


cacttgcctg 




ctaaacataa 


aatctgtaaa 


ataacaagat 




tgattgattg 


tacaggaaaa 


tatacatcgc 


20 


tggaatcaaa 


cttgttgaag 


agaatgttca 




tgctagcctt 


ttctcggtct 


tgcaaacaac 




tacatacctc 


tctccgtatc 


ctcgtaatca 




aaaactttat 


cacacttatc 


tcaaatacac 




gctgacagta 


atatcaaaca 


gtgacacata 


25 


catcagcctc 


aagtcgtcaa 


gtaaagattt 




tgttgataat 


tagcgttgcc 


tcatcaatgc 




taccccacgt 


tcggtccact 


gtgtgccgaa 




<210> 22 






30 


<211> 9 








<212> DNA 








<213> Nicotiana tabacum 




<400> 22 






35 


aatatattt 







gacgccagaa aatgttggtg atgcgcttag 60 
ggaggtgtgg agacaaatgg tgtaaaagac 120 
gctaagaaat aggttattac tgagtagtat 180 
caggcctttt gaaaagcaag cataaaagat 240 
gtaaagataa tgctaaatca tttggctttt 300 
agggggttga cttttaccat ttcaccgcaa 360 
caggcgcata cgctacaatg acccgattct 420 
cgccggcagc ttagtatata aatacacatg 480 
ttttcttgta tttatcgtct tttcgctgta 540 
ttattaaccg cttttactat tatcttctac 600 
ttaaacacag tggtttcttt gcataaacac 660 
cgtgttcatg cagatagata acaatctata 720 
gagatccgtt taaccggacc ctagtgcact 780 
catgctcctt cactatttta acatgtgg 838 



7 



INTERNATIONAL SEARCH REPORT 



Inter >nal Application No 

PCT/US 98/16344 



A. CLASSIFICATION OF SUBJECT MATTER , 

IPC 6 C12N15/82 C12N5/10 A01H5/00 C12Q1/68 



According to International Patent Classification (IPC) or to both national classification and IPC 



B. FIELDS SEARCHED 



Minimum documentation searched (classification system followed by classification symbols) 

IPC 6 C12N A01H C12Q 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practical, search terms used) 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category ° Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



WO 97 27207 A (NORTH CAROLINA STATE 
UNIVERSITY) 31 July 1997 

see page 2, line 9 - line 34 
see page 6, line 16 - page 7, line 29 
see page 12, line 27 - page 13, line 28 
see page 21, line 3 - page 25, line 21 

-/- 



20,21 
1-19 



Further documents are listed in the continuation of box C. 



Patent family members are listed In annex. 



0 Special categories of cited documents : 

"A" document defining the general state of the art which is not 

considered to be of particular relevance 
"E" earlier document but published on or after the international 

filing date 

"L" document which may throw doubts on priority claim(s) or 
which is cited to establish the publication date of another 
citation or other special reason (as specified) 

"O" document referring to an oral disclosure, use, exhibition or 
other means 

"P M document published prior to the international filing date but 
later than the priority date claimed 



T" later document published after the international filing date 
or priority date and not in conflict with the application but 
cited to understand the principle or theory underlying the 
invention 

"X" document of particular relevance; the claimed invention 
cannot be considered novel or cannot be considered to 
involve an inventive step when the document is taken alone 

"Y" document of particular relevance; the claimed invention 

cannot be considered to Involve an inventive step when the 
document is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in the art. 

document member of the same patent family 



Date of the actual completion of the international search 



15 December 1998 



Date of mailing of the international search report 



22/12/1998 



Name and mailing address of the ISA 

European Patent Office, P.B. 581 6 Patentlaan 2 
NL - 2280 HV Rijswijk 
Tel. (+31-70) 340-2040. Tx. 31 651 epo nl. 
Fax: (+31-70) 340-3016 



Authorized officer 



Montero Lopez, B 



Form PCT/ISAC10 (second shoot) (My 1992) 



page 1 of 2 



INTERNATIONAL SEARCH REPORT 



Inter >nal Application No 

PCT/US 98/16344 



^Continuation) DOCUMENTS CONSIDERED TO 8E RELEVANT 



Category 3 Citation of document, with indication, where appropriate, ol the relevant passages 



Relevant to daim No. 



GEORGE C. ALLEN ET AL.: "High-level 
transgene expression in plant cells: 
Effects of a strong Scaffold Attachment 
Region from Tobacco" 
THE PLANT CELL, 

vol. 8, May 1996, pages 899-913, 
XP002072398 

see page 900, left-hand column, paragraph 
2 

see page 908, right-hand column, paragraph 
1 - page 909, left-hand column, paragraph 
2 

WO 94 07902 A (NORTH CAROLINA STATE 
UNIVERSITY) 14 April 1994 
see page 2, line 5 - page 3, line 21 
see page 4, line 1 - page 5, line 17 
see page 10, line 5 - page 11, line 5 



20,21 



1-19 



1-19 



Form PCT/lSA/210 (continuation ol second sheet) (July 1992) 



page 2 of 2 



INTERNATIONAL SEARCH REPORT 

.of ormation on patent family members 



nal Application No 

PCT/US 98/16344 



Patent document 
cited in search report 



Publication 
date 



Patent family 
member(s) 



Publication 



WO 9727207 



31-07-1997 



US 
AU 



5773695 A 
2246997 A 



WO 9407902 



14-04-1994 



AU 
AU 
CA 
EP 
US 



673859 B 
5165593 A 
2147006 
0663921 
5773689 



A 
A 
A 



30-06-1998 
20-08-1997 



28-11-1996 
26-04-1994 
14-04-1994 
26-07-1995 
30-06-1998 



Foim PCTVISAI210 (patent tarnly annex) (July 1992) 



